Working out MSE algorithm
The good news is that preliminary performance of my MSE algorithms is FAST… like, doesn’t even add a second to overall performance. If I can get some good correlation between HMMScore and MSE (especially low scoring HMMScore) then it could be a really good filter. Right now there is not good correlation, but I feel it is because my algorithm is not taking into account certain edge cases or perhaps I’m not totally understanding what makes a sequence a good candidate for a decent HMMScore.
In that vein, I’ve output visualizations of some acid sequences compared to MSMS spectrum data. The first sequence is the “real” sequence (the one predicted by GFS). The rest are candidates.
MMDVIQEIER
VAFGTYCRMR
MTPANDDDVKR
QPKIESDCQR
ASECREQPGVK
GVECQAQIAER
YGAGQQTNLFY.
GNTLLWQEDW.
VTGHGFGCRNR
GKDCRFHGQR