by Brian | 30th July 2009
It’s nearly impossible to find this link on the UNC website. So, for those just wanting to log in, here’s the link you want: UNC TIM Also, if you need to listen to some good local music while you fill in your time card, Graveyard Fields is your band. Read More
by Brian | 28th July 2009
The data is from OutputFile2.pkl. The x-axis is increments of 10,000. The final bar represents all peaks with intensities of 20,000 and above. Read More
by Brian | 25th July 2009
The following are the results from a small experiment testing the hypothesis that combining the spectral data for (probably) matching poly-peptides yields a higher HMM_Score. Running a certain PKL (to me known as “output2.pkl”) on chr4 with GFS we see that two of the returned sequences pop up twice with relatively high HMM_Scores. This strongly [...] Read More
by Brian | 16th July 2009
Here’s the opendiff view of a file before (left) and after (right) peaks have been cleaned. UPDATE: Examples of how cleaning may cut out some ion matches. Without cleaning: peaks ion matches precursor mass sequence 208 11 1544.112183 HGTDDGVVWMNWK 324 10 1544.512207 HGTDDGVVWMNWK 324 10 1544.512207 HGTDDGVVWMNWK 267 10 1818.752197 TMTIHNGMFFSTYDR 224 10 2125.862305 HQLYIDETVNSNIPTNLR [...] Read More
by Brian | 14th July 2009
There were a few mild logic errors with version 1 of MSMSFit (which I pronounce as “Miss Misfit”). These errors were not weighty enough to skew the results for my trials with E. coli, but were augmented to the point of problematic for the human genome. Long, boring story short, I have worked out at [...] Read More
by Brian | 13th July 2009
It looks as though MSMS data may contain multiple peak lists for the same polypeptide. Would it be beneficial to combine the data of such suspected duplicates? I’ve noticed that sometimes there will be large gaps in regions of of a peak list; combining lists could fill these gaps and perhaps help our algorithms. Of [...] Read More
by Brian | 13th July 2009
Funny the properties of coincidence: you only notice them when they are happening. (Corollary: you don’t notice the near infinite things which are not coincidence which are continually happening.) My mind has been filled with refining the MSMSFit alogirthm and along comes this Wired article talking about the mass spectrometry analysis of a T. Rex [...] Read More
by Brian | 11th July 2009
How many copies of the cleanPeakForHighMass method do we need in this code? Read More
by Brian | 10th July 2009
As MSMSFit score ignores large gaps in MSMS data we could take advantage of that by feeding the algorithm only sequences with one missed cleavage (vs. one missed and no missed). Read More
by Brian | 9th July 2009
I wanted o see what kind of MSMSFit scores I would get if I tried to find a human protein in E. coli. With one exception the MSMSFit score did not go above 0.5 for each guessed acid sequence. The sequences that scored higher (again, these are *wrong* sequences) were shorter. This is probably since [...] Read More
by Brian | 9th July 2009
Interesting result: When we have MSMSFit compare only for the y-ion, the results are better than if we include y17ion and y18ion. This is along the same line of thought as why I removed the b-ions: the more possible points we are comparing the greater the chance for a random match. Read More
by Brian | 9th July 2009
CheZ_MS_1_Combo.pkl MSMS data (which contains 5 tandem spectra) on Escherichia_coli_K-12_MG1655.fasta. NOTE: MSMSFit score is a float between 0.0 and 1.0, where 1.0 is a perfect match. Precursor IndexPeptide sequenceDirectionPeptide startPeptide stopMSMSFit Score 4MLAQNVRQNFELLYSRRF1226161226670.3331 2RVMWENWLCRSPFKF7928057928470.3748 1SWHCWKLSTPANRF136106413611030.3750 3HDGQADDRRYPAGGDPARF148158414816380.2727 0RSARTQPEYRF148167414817040.3749 3ELGLDQAIAEAAEAIPDARR196490519649620.6657 1LYYVVQMTAQAAERR196485719648990.5555 2ALNSVEASQPHQDQMEKR196480619648570.5453 0MMDVIQEIERR196457819646080.7774 4QLLMVLLENIPEQESRPKR196452419645780.5999 Read More
by Brian | 9th July 2009
Fitting MSMS data to a proposed acid sequence is a bit messy. One has to account for potential ion types as well as data errors. The image above depicts sample MSMS data (top half) along side the amino acid from which it is derived (bottom half, with multiple lines for different ions). However, when one [...] Read More
by Brian | 8th July 2009
The good news is that preliminary performance of my MSE algorithms is FAST… like, doesn’t even add a second to overall performance. If I can get some good correlation between HMMScore and MSE (especially low scoring HMMScore) then it could be a really good filter. Right now there is not good correlation, but I feel [...] Read More
by Brian | 7th July 2009
I modified the code to have more precise acid weights and I added frag17 (which subtracts a nitrogen and three hydrogen) and frag18 (which subtracts an oxygen and two hydrogen). For this scale frag17 and frag18 values are so very close that they overlap. Still, you can see that for the correct acid sequence there [...] Read More
by Brian | 7th July 2009
Present thought: As HMMScore is expensive and findLongestCommonSubstring is both expensive and not ideal for eliminating candidates for HMMScore, I am curious how Mean Squared Error (MSE) of a sequence to the observed MSMS data correlates to HMMScore. If there is strong correlation then it is possible that it is a viable (both robust and [...] Read More
by Brian | 7th July 2009
The percentages displayed show total percent of over-all processing. Therefore, if one line says 90%, it takes up 90% of all cycles in the program. The screenshot shows nested methods where lower methods are called by higher methods. A large gap in percentage from one to the next shows that the first one contains a [...] Read More
by Brian | 6th July 2009
So Shark is telling you to use the “-ftree-vectorize” flag setting in GCC to squeeze out some floating point performance, yeah? You want to find out how to do that? Yeah… join the club. (First, don’t search on “-ftree-vectorize” in Google as Google will think you *do not* want to use the keyword “ftree-vectorize” becasue [...] Read More
by Brian | 6th July 2009
in calcScoresForMSMS the loop which goes through the masses surrounds the loop which goes through the sequence. if we put the sequence loop on the outside this will confer some advantages: translateNucleotideSequence won’t have to be called over and over. the code which produces a probable MSMS spectrum for a given sequence could be called [...] Read More
by Brian | 2nd July 2009
If you’ve worked with WinMerge, FileMerge is going to take some getting used to because the key commands mean the opposite of what you’ve become accustomed to. The up and down arrow keys still take you to diffs up and down the file. How do you select which diff to use? I couldn’t find documentation [...] Read More