proteomic database fields
For a basic database which holds peptides from in silico digestion of a genome, I think these fields will be helpful:
- peptide – the chain sequence of amino acids
- location – This will probably be many fields which will track location information
- mass – (theoretical)
- size – the number of amino acids
- cleavage – number of missed tripsin cleavage points (0 or 1)
- total
Total is something worth explaining. If two E. coli genomes are digested and put into the database and there is a peptide which is represented in both (very likely), then the “total” value for that database will be 2. If 50 E. coli proteomes are added to the database and a certain peptide is only present in half of them, then the database entry for that peptide will be 25.
The usefulness is, of course, in the detection of anomalies such as mutations, errors, etc. GFS can search for a match to some MSMS data but if the best matched peptide is only has a “total” value of 1 when the maximum “total” value is 50 then we could look at that result with more skepticism.
UPDATE: We could have TWO tables. One that keeps only unique peptides (perhaps along with the total), but the other table will keep every peptide. This way when we find a peptide we are interested in from the smaller, unique-peptide table, we can then use the larger table to tell us exactly which genomes contain that peptide.
Any other fields that would be useful?