by Brian | 25th June 2010
I recently found that something exists called “MS-Fit”. I found this because had confused one of our lab members into thinking that it had something to do with MSMSFit. To avoid future confusion I am redubbing that scoring method “TandemFit”. The name works well in that it scores tandem mass spectrometry data, and it does [...] Read More
by Brian | 24th June 2010
Just added a method and modified constructor to the Spectrum object which allows for normalization of the peaks intensities. It finds the maximum intensity for a given spectrum and then walks through each peak dividing the intensity by this found maximum intensity. This can be handy for SpectrumMatch as well as for comparing scores TandemFit [...] Read More
by Brian | 19th November 2009
if a certain fragment fault-line has already been accounted for with another ion, we can skip further ions for that location. Read More
by Brian | 18th November 2009
I’m experimenting with the MSMSFit scoring and am going down what looks like a very promising road. Interesting: When examining OutputFile2.pkl against chr4.fasta a peptide, GSWYSMRK, came up with a high MSMSFit score that HMM_Score somewhat overlooks (it gets an HMM score of about 60 depending on the peak cleaning method). What is interesting is [...] Read More
by Brian | 11th November 2009
I suspect that some edge cases of MSMSFit (first, last fragments, peaks, etc.) may not be performing as desired. I especially suspect that searching for b-ions isn’t working nearly as well as it should. This has gone unnoticed until now because the way MSMSFit is written, close y-ion matches can pick up the slack for [...] Read More
by Brian | 14th July 2009
There were a few mild logic errors with version 1 of MSMSFit (which I pronounce as “Miss Misfit”). These errors were not weighty enough to skew the results for my trials with E. coli, but were augmented to the point of problematic for the human genome. Long, boring story short, I have worked out at [...] Read More
by Brian | 13th July 2009
It looks as though MSMS data may contain multiple peak lists for the same polypeptide. Would it be beneficial to combine the data of such suspected duplicates? I’ve noticed that sometimes there will be large gaps in regions of of a peak list; combining lists could fill these gaps and perhaps help our algorithms. Of [...] Read More
by Brian | 10th July 2009
As MSMSFit score ignores large gaps in MSMS data we could take advantage of that by feeding the algorithm only sequences with one missed cleavage (vs. one missed and no missed). Read More
by Brian | 9th July 2009
I wanted o see what kind of MSMSFit scores I would get if I tried to find a human protein in E. coli. With one exception the MSMSFit score did not go above 0.5 for each guessed acid sequence. The sequences that scored higher (again, these are *wrong* sequences) were shorter. This is probably since [...] Read More
by Brian | 9th July 2009
Interesting result: When we have MSMSFit compare only for the y-ion, the results are better than if we include y17ion and y18ion. This is along the same line of thought as why I removed the b-ions: the more possible points we are comparing the greater the chance for a random match. Read More
by Brian | 9th July 2009
CheZ_MS_1_Combo.pkl MSMS data (which contains 5 tandem spectra) on Escherichia_coli_K-12_MG1655.fasta. NOTE: MSMSFit score is a float between 0.0 and 1.0, where 1.0 is a perfect match. Precursor IndexPeptide sequenceDirectionPeptide startPeptide stopMSMSFit Score 4MLAQNVRQNFELLYSRRF1226161226670.3331 2RVMWENWLCRSPFKF7928057928470.3748 1SWHCWKLSTPANRF136106413611030.3750 3HDGQADDRRYPAGGDPARF148158414816380.2727 0RSARTQPEYRF148167414817040.3749 3ELGLDQAIAEAAEAIPDARR196490519649620.6657 1LYYVVQMTAQAAERR196485719648990.5555 2ALNSVEASQPHQDQMEKR196480619648570.5453 0MMDVIQEIERR196457819646080.7774 4QLLMVLLENIPEQESRPKR196452419645780.5999 Read More
by Brian | 9th July 2009
Fitting MSMS data to a proposed acid sequence is a bit messy. One has to account for potential ion types as well as data errors. The image above depicts sample MSMS data (top half) along side the amino acid from which it is derived (bottom half, with multiple lines for different ions). However, when one [...] Read More
by Brian | 7th July 2009
Present thought: As HMMScore is expensive and findLongestCommonSubstring is both expensive and not ideal for eliminating candidates for HMMScore, I am curious how Mean Squared Error (MSE) of a sequence to the observed MSMS data correlates to HMMScore. If there is strong correlation then it is possible that it is a viable (both robust and [...] Read More