by Brian | 28th October 2010
When we calculate our E values we are assuming that This is great example of the type of histogram on which the Fenyo method of calculating E values performs poorly. See that one tall histogram bar? As that falls in the small section where the least-square line is calculated what happens is a much steeper [...] Read More
by Brian | 26th August 2010
It hit me today: What if the theoretical mass I’m calculating for ions is consistently off? Even if it is just a little (talking fractions of a proton), that would negatively impact a peptide matching algorithm’s performance. How can we measure this? Relatively easy. The somewhat challenging part is getting a set of spectra from [...] Read More
by Brian | 20th August 2010
Here’s a kind of nifty animation of how TandemFit walks through each MS/MS peak twice (once forwards, once backwards) to find the matches to the theoretical ions of a peptide. Note that the ion being searched for is displayed. The animation! Read More
by Brian | 19th August 2010
It looks like spectrum: T10475_Well_A13_2025.07_16898.mgf..pkl and spectrum: T10475_Well_A13_2025.07_17096.mgf..pkl are the same. This is bad news for me as TandemFit gets both of those “wrong”. The quotes because TandemFit’s match of QAGLQLQESLEPAVRLDR has 11 fragment alignments vs 2 produced by VPAPSIEDICHVLSTVCK which is the “correct” peptide. Update: other duplicates: T10475_Well_A12_1386.68_16898.mgf..pkl T10475_Well_A12_1386.68_17096.mgf..pkl T10475_Well_A03_1551.77_16898.mgf..pkl T10475_Well_A03_1551.77_17096.mgf..pkl T10475_Well_A11_1386.69_16898.mgf..pkl T10475_Well_A11_1386.69_17096.mgf..pkl T10475_Well_A10_1188.45_17096.mgf..pkl [...] Read More
by Brian | 13th August 2010
To better peer into and inspect the code I have to calculate E values, I created a histogram visualizer. Here are two histograms for score distributions for two different spectra: (Note: The small green bars are were values are above zero, but wouldn’t normally be drawn as the bar would normally be less than one [...] Read More
by Brian | 28th June 2010
Peppy, up until now, only digested chromosomes inside of open reading frames. This is no more. There is now choice! In the properties file you can set if you want to digest the whole chromosome or only the ORFs. Read More
by Brian | 28th June 2010
We were recently given by the Chen lab a set of spectra that were derived from the USP set of 50 proteins. We were told that there are probably a good amount of impurities in this set so some of the spectra will not correspond to peptides which can be found in the USP list. [...] Read More
by Brian | 25th June 2010
I recently found that something exists called “MS-Fit”. I found this because had confused one of our lab members into thinking that it had something to do with MSMSFit. To avoid future confusion I am redubbing that scoring method “TandemFit”. The name works well in that it scores tandem mass spectrometry data, and it does [...] Read More
by Brian | 24th June 2010
Just added a method and modified constructor to the Spectrum object which allows for normalization of the peaks intensities. It finds the maximum intensity for a given spectrum and then walks through each peak dividing the intensity by this found maximum intensity. This can be handy for SpectrumMatch as well as for comparing scores TandemFit [...] Read More
by Brian | 21st June 2010
Peppy, the open-source proteogenomic mapping software has been benchmarked at processing over 600,000 spectra per day on a consumer-grade desktop… Read More
by Brian | 15th January 2010
MASS_DIFF_ALLOWED in Defines.h maxTandemPrecursorDelta precursorMassTolerance Read More
by Brian | 4th January 2010
Newer Mac processors have (Quad-Core Intel Xeon “Nehalem”) have “Virtual Cores” which “which allows two threads to run simultaneously on each core.” This raised questions like, “do these virtual cores run at half the speed (or worse because of scheduling overhead)? If I have the option, should I set my appliction (GFS, an ABM, etc.) [...] Read More
by Brian | 2nd December 2009
To judge the possible effectiveness of comparing spectra based on a hash of their peaks I wrote a program to take a list of spectra and convert them to hashes. From that set of spectra there were two pairs of sibling spectra that should be close matches as I am confident that they are derived [...] Read More
by Brian | 1st December 2009
Note to Brian: try simply summing the intensities of peaks that come within a certain delta of a frag mass. Then divide that sum by Max(peak intensities). This will often grab more than one peak for a fragment, but this may produce good results as often multiple peaks cluster closely around ions. Read More
by Brian | 19th November 2009
if a certain fragment fault-line has already been accounted for with another ion, we can skip further ions for that location. Read More
by Brian | 19th November 2009
in MSMSFit, differenceThreshold could be a function of peak intensity. That is, there can be more leeway for greater intensity peaks. Read More
by Brian | 19th November 2009
There’s something terrifying in chromosome 4… Read More
by Brian | 18th November 2009
I’m experimenting with the MSMSFit scoring and am going down what looks like a very promising road. Interesting: When examining OutputFile2.pkl against chr4.fasta a peptide, GSWYSMRK, came up with a high MSMSFit score that HMM_Score somewhat overlooks (it gets an HMM score of about 60 depending on the peak cleaning method). What is interesting is [...] Read More
by Brian | 11th November 2009
I suspect that some edge cases of MSMSFit (first, last fragments, peaks, etc.) may not be performing as desired. I especially suspect that searching for b-ions isn’t working nearly as well as it should. This has gone unnoticed until now because the way MSMSFit is written, close y-ion matches can pick up the slack for [...] Read More
by Brian | 3rd November 2009
Similar in principle to the various folding@home projects. It may seem that GFS may not be conducive to such segmentation as genomic data is quite large making download times prohibitive. This is not the case if a node “specializes” in a subset of the genome. A node only needs access to a spectrum file and [...] Read More