Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra
Laura Mismetti, Marvin Alberts, Andreas Krause, Mara Graziani

TL;DR
This paper presents a transformer-based end-to-end framework for molecular structure generation from MS/MS spectra, utilizing test-time tuning to adapt to new data and outperform existing methods in accuracy and chemical plausibility.
Contribution
The authors introduce a novel transformer model with test-time tuning for direct molecular structure generation from spectra, eliminating manual annotations and improving out-of-distribution performance.
Findings
Achieves 3.16% Top-1 accuracy on MassSpecGym benchmark.
Outperforms baseline methods by 27% and 67% on key datasets.
Significantly improves Tanimoto similarity, indicating high structural plausibility.
Abstract
Tandem Mass Spectrometry is a cornerstone technique for identifying unknown small molecules in fields such as metabolomics, natural product discovery and environmental analysis. However, certain aspects, such as the probabilistic fragmentation process and size of the chemical space, make structure elucidation from such spectra highly challenging, particularly when there is a shift between the deployment and training conditions. Current methods rely on database matching of previously observed spectra of known molecules and multi-step pipelines that require intermediate fingerprint prediction or expensive fragment annotations. We introduce a novel end-to-end framework based on a transformer model that directly generates molecular structures from an input tandem mass spectrum and its corresponding molecular formula, thereby eliminating the need for manual annotations and intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
