To Bin or not to Bin: Alternative Representations of Mass Spectra
Niek de Jonge, Justin J. J. van der Hooft, Daniel Probst

TL;DR
This paper explores set-based and graph-based representations of mass spectra as alternatives to traditional binning, demonstrating improved performance in machine learning tasks for molecular property prediction.
Contribution
It introduces and compares set-based and graph-based spectral representations, showing they outperform binned data in regression tasks.
Findings
Set and graph representations outperform binned spectra in predictive accuracy.
Both alternative representations improve machine learning performance.
Set transformer and graph neural network effectively utilize the new representations.
Abstract
Mass spectrometry, especially so-called tandem mass spectrometry, is commonly used to assess the chemical diversity of samples. The resulting mass fragmentation spectra are representations of molecules of which the structure may have not been determined. This poses the challenge of experimentally determining or computationally predicting molecular structures from mass spectra. An alternative option is to predict molecular properties or molecular similarity directly from spectra. Various methodologies have been proposed to embed mass spectra for further use in machine learning tasks. However, these methodologies require preprocessing of the spectra, which often includes binning or sub-sampling peaks with the main reasoning of creating uniform vector sizes and removing noise. Here, we investigate two alternatives to the binning of mass spectra before down-stream machine learning tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular spectroscopy and chirality
MethodsFragmentation · Set Transformer · Graph Neural Network · Sparse Evolutionary Training
