More on cleanSpectra and cleaning MSMS data

by Brian | 26th August 2009

Jainab and I have discussed this before, but I want to get it down: Taking your peak list, sorting by intensity, then selecting the top X of them can be detrimental for this reason: Some locations where fragments are very likely to occur and with high intensities can act as intensity hogs. They will be over-represented in the high intensity range and could potentially crowd out valid fragments with lower intensities.

A method Jainab proposed is to divide up the list by mass groups with, say, 100Da range. These smaller groups are then sorted and culled.

Perhaps all that is needed is to reduce the selectivity of cleanSpectra. Instead of returning the top 100-which becomes more selective the larger the list-it could return the top 50%.

One Response to “More on cleanSpectra and cleaning MSMS data”

  1. jainab

    Sep 1st, 2009 :

    Brian you are right that some fragments are more likely to occur and they may over-represent leaving the possibility of losing real peaks. But if we select top 50% of those peaks, I am sure we will consider many unreal peaks. Some of the pkl file consist sometimes 1000 peaks and if we consider top 500 from them, we will have about 1 real peak in each 10 (what I am trying to say is that a typical peptide generate around 50 real peaks) peaks. So I think if we really want to improve in sorting we need to with dividing in mass-range.

Leave a Reply

Name (Required)

Email (Required - will not be published)

Website

Message (Required)