Success with MSMSFit on the human genome
There were a few mild logic errors with version 1 of MSMSFit (which I pronounce as “Miss Misfit”). These errors were not weighty enough to skew the results for my trials with E. coli, but were augmented to the point of problematic for the human genome.
Long, boring story short, I have worked out at least the most grievous of the logic bugs. MSMSFit looks for matches with the two main b and y ions. A match is considered a “hit” if it is within 0.15Da. This may be a parameter we wish to be able to specify as it is based on the accuracy of the MS apparatus.
In this particular trial on chr4 each of the 4 correct peptides (as deemed by HMM Score) was the #1 selection by MSMSFit for the corresponding spectra. Here is the output (when viewing the file in safari a find on “1557″ and scroll down to the forward direction to see the correct peptide cluster. Again, this output is of a greedy implementation meaning that each spectra has only two peptide recommendations (one for each forward/reverse direction). It could fairly easily be tailored to return a list of top recommendations, but, for this trial, only the top recommendation is required.
UPDATE: it should also be noted that I am crippling most of the cleanSpectra code. At least for MSMSFit that cleaning process does more harm than good.
Also, I have eliminated altogether the use of squared error as a factor. The philosophy simply wasn’t right: since a “hit” is coming as close as we can to the measurement accuracy of the mass spectrography machine, then why should we punish for error which is essentially random (and therefore has nothing to do with fitness).