Success with MSMSFit on the human genome

by Brian | 14th July 2009

There were a few mild logic errors with version 1 of MSMSFit (which I pronounce as “Miss Misfit”). These errors were not weighty enough to skew the results for my trials with E. coli, but were augmented to the point of problematic for the human genome.

Long, boring story short, I have worked out at least the most grievous of the logic bugs. MSMSFit looks for matches with the two main b and y ions. A match is considered a “hit” if it is within 0.15Da. This may be a parameter we wish to be able to specify as it is based on the accuracy of the MS apparatus.

In this particular trial on chr4 each of the 4 correct peptides (as deemed by HMM Score) was the #1 selection by MSMSFit for the corresponding spectra. Here is the output (when viewing the file in safari a find on “1557″ and scroll down to the forward direction to see the correct peptide cluster. Again, this output is of a greedy implementation meaning that each spectra has only two peptide recommendations (one for each forward/reverse direction). It could fairly easily be tailored to return a list of top recommendations, but, for this trial, only the top recommendation is required.

UPDATE: it should also be noted that I am crippling most of the cleanSpectra code. At least for MSMSFit that cleaning process does more harm than good.

Also, I have eliminated altogether the use of squared error as a factor. The philosophy simply wasn’t right: since a “hit” is coming as close as we can to the measurement accuracy of the mass spectrography machine, then why should we punish for error which is essentially random (and therefore has nothing to do with fitness).

One Response to “Success with MSMSFit on the human genome”

  1. Jainab

    Jul 15th, 2009 :

    Are you sure cleaning does more harm than good? I do not see the cause.

Leave a Reply

Name (Required)

Email (Required - will not be published)

Website

Message (Required)