E values are less reliable with databases of spliced peptides

by Brian | 28th October 2010

When we calculate our E values we are assuming that

This is great example of the type of histogram on which the Fenyo method of calculating E values performs poorly.

2.1931996962851298E-14

See that one tall histogram bar? As that falls in the small section where the least-square line is calculated what happens is a much steeper slope (and, directly, a much better E value) comes out than really should be.

These kinds of histograms happen all the time in databases which contain peptides derived from multiple splicing junctions. The reason is that there may be many peptides where the front end or tail end are correct. For example, let’s say the correct peptide for a spectrum is “WSFFFFCGYN”, but an intron position begins after the C. The database may contain many variations on that peptide which begin with “WSFFFFC”. If those alternate peptides also fall within our precursor tolerance then they will all produce fairly decent scores, which will create the kind of spike such as that in the above histogram.

Leave a Reply

Name (Required)

Email (Required - will not be published)

Website

Message (Required)