E values are less reliable with databases of spliced peptides

by Brian | 28th October 2010

When we calculate our E values we are assuming that This is great example of the type of histogram on which the Fenyo method of calculating E values performs poorly. See that one tall histogram bar? As that falls in the small section where the least-square line is calculated what happens is a much steeper [...] Read More

Tandem Mass Spectrometry Peak Calibration Tool

by Brian | 26th August 2010

It hit me today: What if the theoretical mass I’m calculating for ions is consistently off? Even if it is just a little (talking fractions of a proton), that would negatively impact a peptide matching algorithm’s performance. How can we measure this? Relatively easy. The somewhat challenging part is getting a set of spectra from [...] Read More

TandemFit animation

by Brian | 20th August 2010

Here’s a kind of nifty animation of how TandemFit walks through each MS/MS peak twice (once forwards, once backwards) to find the matches to the theoretical ions of a peptide. Note that the ion being searched for is displayed. The animation! Read More

Aurum dataset oddity

by Brian | 19th August 2010

It looks like spectrum: T10475_Well_A13_2025.07_16898.mgf..pkl and spectrum: T10475_Well_A13_2025.07_17096.mgf..pkl are the same. This is bad news for me as TandemFit gets both of those “wrong”. The quotes because TandemFit’s match of QAGLQLQESLEPAVRLDR has 11 fragment alignments vs 2 produced by VPAPSIEDICHVLSTVCK which is the “correct” peptide. Update: other duplicates: T10475_Well_A12_1386.68_16898.mgf..pkl T10475_Well_A12_1386.68_17096.mgf..pkl T10475_Well_A03_1551.77_16898.mgf..pkl T10475_Well_A03_1551.77_17096.mgf..pkl T10475_Well_A11_1386.69_16898.mgf..pkl T10475_Well_A11_1386.69_17096.mgf..pkl T10475_Well_A10_1188.45_17096.mgf..pkl [...] Read More

How to Fake a Great E Value

by Brian | 13th August 2010

To better peer into and inspect the code I have to calculate E values, I created a histogram visualizer. Here are two histograms for score distributions for two different spectra: (Note: The small green bars are were values are above zero, but wouldn’t normally be drawn as the bar would normally be less than one [...] Read More

The battle of ORF: Open Reading Frames

by Brian | 28th June 2010

Peppy, up until now, only digested chromosomes inside of open reading frames. This is no more. There is now choice! In the properties file you can set if you want to digest the whole chromosome or only the ORFs. Read More

The USP data

by Brian | 28th June 2010

We were recently given by the Chen lab a set of spectra that were derived from the USP set of 50 proteins. We were told that there are probably a good amount of impurities in this set so some of the spectra will not correspond to peptides which can be found in the USP list. [...] Read More

MSMSFit now TandemFit

by Brian | 25th June 2010

I recently found that something exists called “MS-Fit”. I found this because had confused one of our lab members into thinking that it had something to do with MSMSFit. To avoid future confusion I am redubbing that scoring method “TandemFit”. The name works well in that it scores tandem mass spectrometry data, and it does [...] Read More

Spectrum intensity normalization

by Brian | 24th June 2010

Just added a method and modified constructor to the Spectrum object which allows for normalization of the peaks intensities. It finds the maximum intensity for a given spectrum and then walks through each peak dividing the intensity by this found maximum intensity. This can be handy for SpectrumMatch as well as for comparing scores TandemFit [...] Read More

Peppy test

by Brian | 21st June 2010

/images/peppy-main-page.jpg

Peppy, the open-source proteogenomic mapping software has been benchmarked at processing over 600,000 spectra per day on a consumer-grade desktop… Read More

percursor mass tolerance

by Brian | 15th January 2010

MASS_DIFF_ALLOWED in Defines.h maxTandemPrecursorDelta precursorMassTolerance Read More

Virtual cores: Use them.

by Brian | 4th January 2010

Newer Mac processors have (Quad-Core Intel Xeon “Nehalem”) have “Virtual Cores” which “which allows two threads to run simultaneously on each core.” This raised questions like, “do these virtual cores run at half the speed (or worse because of scheduling overhead)? If I have the option, should I set my appliction (GFS, an ABM, etc.) [...] Read More

Report on lattice based MSMS hashing

by Brian | 2nd December 2009

To judge the possible effectiveness of comparing spectra based on a hash of their peaks I wrote a program to take a list of spectra and convert them to hashes. From that set of spectra there were two pairs of sibling spectra that should be close matches as I am confident that they are derived [...] Read More

Adding intensity

by Brian | 1st December 2009

Note to Brian: try simply summing the intensities of peaks that come within a certain delta of a frag mass. Then divide that sum by Max(peak intensities). This will often grab more than one peak for a fragment, but this may produce good results as often multiple peaks cluster closely around ions. Read More

MSMSFit idea: skip ion if fragment already accounted for

by Brian | 19th November 2009

if a certain fragment fault-line has already been accounted for with another ion, we can skip further ions for that location. Read More

variable differenceThreshold

by Brian | 19th November 2009

in MSMSFit, differenceThreshold could be a function of peak intensity. That is, there can be more leeway for greater intensity peaks. Read More

Check out this hilarious peptide!

by Brian | 19th November 2009

There’s something terrifying in chromosome 4… Read More

GSWYSMRK

by Brian | 18th November 2009

I’m experimenting with the MSMSFit scoring and am going down what looks like a very promising road. Interesting: When examining OutputFile2.pkl against chr4.fasta a peptide, GSWYSMRK, came up with a high MSMSFit score that HMM_Score somewhat overlooks (it gets an HMM score of about 60 depending on the peak cleaning method). What is interesting is [...] Read More

Improving MSMSFit

by Brian | 11th November 2009

I suspect that some edge cases of MSMSFit (first, last fragments, peaks, etc.) may not be performing as desired. I especially suspect that searching for b-ions isn’t working nearly as well as it should. This has gone unnoticed until now because the way MSMSFit is written, close y-ion matches can pick up the slack for [...] Read More

Brainstorm: massive “crowdsourcing” of proteomic processing

by Brian | 3rd November 2009

Similar in principle to the various folding@home projects. It may seem that GFS may not be conducive to such segmentation as genomic data is quite large making download times prohibitive. This is not the case if a node “specializes” in a subset of the genome. A node only needs access to a spectrum file and [...] Read More

Page 1 of 512345