GFS Performance experiments
The trick with problem sets so sparsely filled with noteworthy information is skipping as many processing steps as possible without marring the final results by skipping something important.
I have made some modifications to GFS and am testing those now:
- If MSMSFit goes through all of the y-ions and finds less than 12% match then that sequence is skipped
- seqTagScore doesn’t send a sequence to HMM_score unless it has at least 50% match from MSMSFit.
- Sequences weighing less than 1000Da are skipped. This has been changed from 600. This change eliminates 373 additional spectra from consideration from our set of 3010* which should produce a modest 13% improvement, so if is deemed that we should revert back to 600 the performance hit won’t be devastating.
UPDATE: The job ran 3475 spectra in 1 hour 23
minutes. With efficient load distribution in our pipeline (assuming 12
machines) we should be able to get about 45,000 SPD (spectra per day).
*3475 is reduced to 3010 when we eliminate spectra with precursor mass less than 600 Da.