How to Fake a Great E Value

by Brian | 13th August 2010

To better peer into and inspect the code I have to calculate E values, I created a histogram visualizer. Here are two histograms for score distributions for two different spectra: (Note: The small green bars are were values are above zero, but wouldn’t normally be drawn as the bar would normally be less than one pixel high.)

Histogram for Spectrum A:

E value: 1.0659386771035567E-26

Histogram for Spectrum B:

E value: 2.64817948989741E-19

These two score distributions typify what you’d expect when a correct match is present. Most scores fall into a large mass on the low end, but then, on the right most side, far away from the other scores, you see a tiny blip where the correct score lives.

Now, just by looking at the two would you say that you are much more confident about the results of the first one over the second one? Maybe so, but would you say that you are 10,000,000 times more confident? Well, that’s what those E values imply. Like I said before, E values are highly sensitive to the shape of the right tail of the histogram, and the slight difference in shapes between the two skewed the linear regression to make the confidence factors so starkly different.

This dependence on histogram shape can be used to game the E value method to make a scoring algorithm that produces results with highly overly inflated confidence scores.

What I’m about to do is simple and can be implemented by any scoring method.

IMPORTANT NOTE: I’m not changing the E value code at all. I’m only changing the code in the scoring method. I’ve gone ahead and altered the code in TandemFit to square the final score. Here’s the code:

score = score * score;

Simple enough? Do you think this improves the quality of the scores?

Spoiler: It doesn’t. This will not change the ordering of TandemFit scores. The best score will still be the best score and the worst score will still be the worst scores. However, the distribution of the scores will be very different. What were scores of 1, 2, 3, 4 and 5 will now be scores of 1, 4, 9, 16 and 25. Because larger scores are more affected by the squaring process, this will appear to make our best scores (the true matches) seem much further removed from the rest.

With this one change, here are the histograms and E values for the scores for those same two spectra:

Histogram for Spectrum A:

3.202345672236334E-91

E value: 3.202345672236334E-91

Histogram for Spectrum B:

E value: 1.1669794248500472E-57

Look at those frickin’ E values! The first one is within throwing distance of being one over a googol!

Now let’s say we were comparing the two scoring methods. For spectrum A, is our confidence really 10^65 better for the second scoring method?

Will you ever trust anyone’s E values again?

One Response to “How to Fake a Great E Value”

  1. Jainab Khatun

    Aug 13th, 2010 :

    Brain, Morgan or whoever has concerned with E_values,
    Well, may be there is no standard for E-Values, such as Brian pointed out that we can change the values of E_Values simply by squaring the scores. But in my understanding the relative values does not change, just you get different values. For examples in Brian’s first scoring method the difference in E-Values is 10^7. Now that does not mean that the identification for spectrum A is 10,000,000 times more confident than the spectrum B unless using the same scoring method and sufficient amount of standard data someone actually verifies that. 10^7 difference may actually mean 10% increase in confidence and we have to find out using standard data. Similarly for the second scoring method even though the difference in E-Values is 10^34, that may mean only 10% increase in confidence and in this case E-Values of 10^-20 may actually mean false positive. Therefore in my understanding E-Values are relative and we have to find our own standard, I mean we have to find out does E-Value of 10^-6 actually mean 1 in a million FP. I do trust E-values if you define your standard using sufficient amount of data.

Leave a Reply

Name (Required)

Email (Required - will not be published)

Website

Message (Required)