More on cleanSpectra and cleaning MSMS data
Jainab and I have discussed this before, but I want to get it down: Taking your peak list, sorting by intensity, then selecting the top X of them can be detrimental for this reason: Some locations where fragments are very likely to occur and with high intensities can act as intensity hogs. They will be over-represented in the high intensity range and could potentially crowd out valid fragments with lower intensities.
A method Jainab proposed is to divide up the list by mass groups with, say, 100Da range. These smaller groups are then sorted and culled.
Perhaps all that is needed is to reduce the selectivity of cleanSpectra. Instead of returning the top 100-which becomes more selective the larger the list-it could return the top 50%.