UNC TIM login

by Brian | 30th July 2009

It’s nearly impossible to find this link on the UNC website. So, for those just wanting to log in, here’s the link you want: UNC TIM Also, if you need to listen to some good local music while you fill in your time card, Graveyard Fields is your band. Read More

Distribution of intensities in MS/MS data

by Brian | 28th July 2009

The data is from OutputFile2.pkl. The x-axis is increments of 10,000. The final bar represents all peaks with intensities of 20,000 and above. Read More

Combining matching spectra in peak lists

by Brian | 25th July 2009

The following are the results from a small experiment testing the hypothesis that combining the spectral data for (probably) matching poly-peptides yields a higher HMM_Score. Running a certain PKL (to me known as “output2.pkl”) on chr4 with GFS we see that two of the returned sequences pop up twice with relatively high HMM_Scores. This strongly [...] Read More

The data cleanSpectra “cleans”

by Brian | 16th July 2009

Here’s the opendiff view of a file before (left) and after (right) peaks have been cleaned. UPDATE: Examples of how cleaning may cut out some ion matches. Without cleaning: peaks ion matches precursor mass sequence 208 11 1544.112183 HGTDDGVVWMNWK 324 10 1544.512207 HGTDDGVVWMNWK 324 10 1544.512207 HGTDDGVVWMNWK 267 10 1818.752197 TMTIHNGMFFSTYDR 224 10 2125.862305 HQLYIDETVNSNIPTNLR [...] Read More

Success with MSMSFit on the human genome

by Brian | 14th July 2009

There were a few mild logic errors with version 1 of MSMSFit (which I pronounce as “Miss Misfit”). These errors were not weighty enough to skew the results for my trials with E. coli, but were augmented to the point of problematic for the human genome. Long, boring story short, I have worked out at [...] Read More

Grouping MSMS data

by Brian | 13th July 2009

It looks as though MSMS data may contain multiple peak lists for the same polypeptide. Would it be beneficial to combine the data of such suspected duplicates? I’ve noticed that sometimes there will be large gaps in regions of of a peak list; combining lists could fill these gaps and perhaps help our algorithms. Of [...] Read More

Mass Spectrometry in the news!

by Brian | 13th July 2009

Funny the properties of coincidence: you only notice them when they are happening. (Corollary: you don’t notice the near infinite things which are not coincidence which are continually happening.) My mind has been filled with refining the MSMSFit alogirthm and along comes this Wired article talking about the mass spectrometry analysis of a T. Rex [...] Read More

cleanPeakForHighMass

by Brian | 11th July 2009

How many copies of the cleanPeakForHighMass method do we need in this code? Read More

Quick thought: no need to account for missed cleavage points?

by Brian | 10th July 2009

As MSMSFit score ignores large gaps in MSMS data we could take advantage of that by feeding the algorithm only sequences with one missed cleavage (vs. one missed and no missed). Read More

Plotting something wrong

by Brian | 9th July 2009

I wanted o see what kind of MSMSFit scores I would get if I tried to find a human protein in E. coli. With one exception the MSMSFit score did not go above 0.5 for each guessed acid sequence. The sequences that scored higher (again, these are *wrong* sequences) were shorter. This is probably since [...] Read More

The less ions the better

by Brian | 9th July 2009

Interesting result: When we have MSMSFit compare only for the y-ion, the results are better than if we include y17ion and y18ion. This is along the same line of thought as why I removed the b-ions: the more possible points we are comparing the greater the chance for a random match. Read More

MSMSFit on E. coli

by Brian | 9th July 2009

CheZ_MS_1_Combo.pkl MSMS data (which contains 5 tandem spectra) on Escherichia_coli_K-12_MG1655.fasta. NOTE: MSMSFit score is a float between 0.0 and 1.0, where 1.0 is a perfect match. Precursor IndexPeptide sequenceDirectionPeptide startPeptide stopMSMSFit Score 4MLAQNVRQNFELLYSRRF1226161226670.3331 2RVMWENWLCRSPFKF7928057928470.3748 1SWHCWKLSTPANRF136106413611030.3750 3HDGQADDRRYPAGGDPARF148158414816380.2727 0RSARTQPEYRF148167414817040.3749 3ELGLDQAIAEAAEAIPDARR196490519649620.6657 1LYYVVQMTAQAAERR196485719648990.5555 2ALNSVEASQPHQDQMEKR196480619648570.5453 0MMDVIQEIERR196457819646080.7774 4QLLMVLLENIPEQESRPKR196452419645780.5999 Read More

MSMSFit: A method for comparing MSMS data to amino acid sequences

by Brian | 9th July 2009

Fitting MSMS data to a proposed acid sequence is a bit messy. One has to account for potential ion types as well as data errors. The image above depicts sample MSMS data (top half) along side the amino acid from which it is derived (bottom half, with multiple lines for different ions). However, when one [...] Read More

Working out MSE algorithm

by Brian | 8th July 2009

The good news is that preliminary performance of my MSE algorithms is FAST… like, doesn’t even add a second to overall performance. If I can get some good correlation between HMMScore and MSE (especially low scoring HMMScore) then it could be a really good filter. Right now there is not good correlation, but I feel [...] Read More

Improved MSMS visualizer

by Brian | 7th July 2009

I modified the code to have more precise acid weights and I added frag17 (which subtracts a nitrogen and three hydrogen) and frag18 (which subtracts an oxygen and two hydrogen). For this scale frag17 and frag18 values are so very close that they overlap. Still, you can see that for the correct acid sequence there [...] Read More

Mean Squared Error of predicted MSMS to observed MSMS

by Brian | 7th July 2009

Present thought: As HMMScore is expensive and findLongestCommonSubstring is both expensive and not ideal for eliminating candidates for HMMScore, I am curious how Mean Squared Error (MSE) of a sequence to the observed MSMS data correlates to HMMScore. If there is strong correlation then it is possible that it is a viable (both robust and [...] Read More

Top-down performance cross-section of GFS

by Brian | 7th July 2009

The percentages displayed show total percent of over-all processing. Therefore, if one line says 90%, it takes up 90% of all cycles in the program. The screenshot shows nested methods where lower methods are called by higher methods. A large gap in percentage from one to the next shows that the first one contains a [...] Read More

-ftree-vectorize in XCode (with GCC)

by Brian | 6th July 2009

So Shark is telling you to use the “-ftree-vectorize” flag setting in GCC to squeeze out some floating point performance, yeah? You want to find out how to do that? Yeah… join the club. (First, don’t search on “-ftree-vectorize” in Google as Google will think you *do not* want to use the keyword “ftree-vectorize” becasue [...] Read More

Points to optimize

by Brian | 6th July 2009

in calcScoresForMSMS the loop which goes through the masses surrounds the loop which goes through the sequence. if we put the sequence loop on the outside this will confer some advantages: translateNucleotideSequence won’t have to be called over and over. the code which produces a probable MSMS spectrum for a given sequence could be called [...] Read More

FileMerge key commands

by Brian | 2nd July 2009

If you’ve worked with WinMerge, FileMerge is going to take some getting used to because the key commands mean the opposite of what you’ve become accustomed to. The up and down arrow keys still take you to diffs up and down the file. How do you select which diff to use? I couldn’t find documentation [...] Read More

Page 1 of 212