E values are less reliable with databases of spliced peptides

by Brian | 28th October 2010

When we calculate our E values we are assuming that This is great example of the type of histogram on which the Fenyo method of calculating E values performs poorly. See that one tall histogram bar? As that falls in the small section where the least-square line is calculated what happens is a much steeper [...] Read More

TandemFit animation

by Brian | 20th August 2010

Here’s a kind of nifty animation of how TandemFit walks through each MS/MS peak twice (once forwards, once backwards) to find the matches to the theoretical ions of a peptide. Note that the ion being searched for is displayed. The animation! Read More

Aurum dataset oddity

by Brian | 19th August 2010

It looks like spectrum: T10475_Well_A13_2025.07_16898.mgf..pkl and spectrum: T10475_Well_A13_2025.07_17096.mgf..pkl are the same. This is bad news for me as TandemFit gets both of those “wrong”. The quotes because TandemFit’s match of QAGLQLQESLEPAVRLDR has 11 fragment alignments vs 2 produced by VPAPSIEDICHVLSTVCK which is the “correct” peptide. Update: other duplicates: T10475_Well_A12_1386.68_16898.mgf..pkl T10475_Well_A12_1386.68_17096.mgf..pkl T10475_Well_A03_1551.77_16898.mgf..pkl T10475_Well_A03_1551.77_17096.mgf..pkl T10475_Well_A11_1386.69_16898.mgf..pkl T10475_Well_A11_1386.69_17096.mgf..pkl T10475_Well_A10_1188.45_17096.mgf..pkl [...] Read More

The USP data

by Brian | 28th June 2010

We were recently given by the Chen lab a set of spectra that were derived from the USP set of 50 proteins. We were told that there are probably a good amount of impurities in this set so some of the spectra will not correspond to peptides which can be found in the USP list. [...] Read More

Spectrum intensity normalization

by Brian | 24th June 2010

Just added a method and modified constructor to the Spectrum object which allows for normalization of the peaks intensities. It finds the maximum intensity for a given spectrum and then walks through each peak dividing the intensity by this found maximum intensity. This can be handy for SpectrumMatch as well as for comparing scores TandemFit [...] Read More

Report on lattice based MSMS hashing

by Brian | 2nd December 2009

To judge the possible effectiveness of comparing spectra based on a hash of their peaks I wrote a program to take a list of spectra and convert them to hashes. From that set of spectra there were two pairs of sibling spectra that should be close matches as I am confident that they are derived [...] Read More

Adding intensity

by Brian | 1st December 2009

Note to Brian: try simply summing the intensities of peaks that come within a certain delta of a frag mass. Then divide that sum by Max(peak intensities). This will often grab more than one peak for a fragment, but this may produce good results as often multiple peaks cluster closely around ions. Read More

variable differenceThreshold

by Brian | 19th November 2009

in MSMSFit, differenceThreshold could be a function of peak intensity. That is, there can be more leeway for greater intensity peaks. Read More

PIE fragments report progress

by Brian | 21st October 2009

Fragments Report Null modifications Carboxymethyl (C) Acetyl (N-term) Carbamidomethyl (C) Oxidation (M) Phospho (ST) Or in bar form…. Null modifications Carboxymethyl (C) Acetyl (N-term) Carbamidomethyl (C) Oxidation (M) Phospho (ST) Read More

Tandem Mass data hash

by Brian | 9th October 2009

“Tandem mass data hash,” try to say that five times fast. To get a better feel for how MS/MS data will be represented as a hash I wrote a quick visualizer. How the MSMS data is hashed: an NxM bin matrix is computed where N is the number of bins to store mass and M [...] Read More

XCode not recognizing break points

by Brian | 9th October 2009

This was confusing the hell out of me. I’d have an fprintf call that would say something like “loading spectra…” and a breakpoint before that and in the console it would still print “loading spectra…” and continue on its merry way like the breakpoint didn’t even exist. So what’s the real problem? I don’t know [...] Read More

Morgan’s summary of proteogenomic mapping

by Brian | 18th September 2009

Read More

More on cleanSpectra and cleaning MSMS data

by Brian | 26th August 2009

Jainab and I have discussed this before, but I want to get it down: Taking your peak list, sorting by intensity, then selecting the top X of them can be detrimental for this reason: Some locations where fragments are very likely to occur and with high intensities can act as intensity hogs. They will be [...] Read More

Git first CVS import attempt

by Brian | 18th August 2009

For my reference more than for yours: frenchbroad:gfs.git risk2$ git cvsimport -v -d :local:/Volumes/LabShare/cvsroot GFS-Vec-V2 Initialized empty Git repository in /Users/risk2/Documents/gfs.git/.git/ Running cvsps... cvs_direct initialized to CVSROOT /Volumes/LabShare/cvsroot cvs [rlog aborted]: -t/-f wrappers not supported by this version of CVS A legacy version of cvs with -t/-f wrapper support is available as: /usr/bin/ocvs. Read More

download git-cvsimport tool

by Brian | 17th August 2009

You probably don’t need to. If you have the Git tool installed it is probably one of the tools that comes with it. To check try this on the command line: git help -a | grep cvsimport I got that from this site on how to import CVS into Git. The official Git page on [...] Read More

Distribution of intensities in MS/MS data

by Brian | 28th July 2009

The data is from OutputFile2.pkl. The x-axis is increments of 10,000. The final bar represents all peaks with intensities of 20,000 and above. Read More

Combining matching spectra in peak lists

by Brian | 25th July 2009

The following are the results from a small experiment testing the hypothesis that combining the spectral data for (probably) matching poly-peptides yields a higher HMM_Score. Running a certain PKL (to me known as “output2.pkl”) on chr4 with GFS we see that two of the returned sequences pop up twice with relatively high HMM_Scores. This strongly [...] Read More

The data cleanSpectra “cleans”

by Brian | 16th July 2009

Here’s the opendiff view of a file before (left) and after (right) peaks have been cleaned. UPDATE: Examples of how cleaning may cut out some ion matches. Without cleaning: peaks ion matches precursor mass sequence 208 11 1544.112183 HGTDDGVVWMNWK 324 10 1544.512207 HGTDDGVVWMNWK 324 10 1544.512207 HGTDDGVVWMNWK 267 10 1818.752197 TMTIHNGMFFSTYDR 224 10 2125.862305 HQLYIDETVNSNIPTNLR [...] Read More

Working out MSE algorithm

by Brian | 8th July 2009

The good news is that preliminary performance of my MSE algorithms is FAST… like, doesn’t even add a second to overall performance. If I can get some good correlation between HMMScore and MSE (especially low scoring HMMScore) then it could be a really good filter. Right now there is not good correlation, but I feel [...] Read More

Improved MSMS visualizer

by Brian | 7th July 2009

I modified the code to have more precise acid weights and I added frag17 (which subtracts a nitrogen and three hydrogen) and frag18 (which subtracts an oxygen and two hydrogen). For this scale frag17 and frag18 values are so very close that they overlap. Still, you can see that for the correct acid sequence there [...] Read More

Page 1 of 3123