De novo molecular structure elucidation from mass spectra via flow matching
Ghaith Mqawass (1,2), Tuan Le (2), Fabian Theis (1,3,4), Djork-Arn\'e Clevert (2) ((1) TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany, (2) Machine Learning, Computational Sciences, Pfizer Research & Development, Berlin, Germany

TL;DR
This paper introduces MSFlow, a novel flow-matching generative model that significantly improves the accuracy of translating mass spectra into molecular structures, enabling better chemical analysis and discovery.
Contribution
MSFlow is the first two-stage flow-matching model for molecular structure elucidation from mass spectra, achieving state-of-the-art performance with a 14-fold improvement over previous methods.
Findings
MSFlow accurately predicts structures for up to 45% of spectra.
Use of molecular descriptors enhances encoding quality.
Model outperforms existing methods by up to 14 times.
Abstract
Mass spectrometry is a powerful and widely used tool for identifying molecular structures due to its sensitivity and ability to profile complex samples. However, translating spectra into full molecular structures is a difficult, under-defined inverse problem. Overcoming this problem is crucial for enabling biological insight, discovering new metabolites, and advancing chemical research across multiple fields. To this end, we develop MSFlow, a two-stage encoder-decoder flow-matching generative model that achieves state-of-the-art performance on the structure elucidation task for small molecules. In the first stage, we adopt a formula-restricted transformer model for encoding mass spectra into a continuous and chemically informative embedding space, while in the second stage, we train a decoder flow matching model to reconstruct molecules from latent embeddings of mass spectra. We present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies · Machine Learning in Materials Science
