MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation
Yang Han, Pengyu Wang, Kai Yu, Xin Chen, Lu Chen

TL;DR
MS-BART is a unified deep learning framework that improves molecular structure elucidation from mass spectrometry data by leveraging cross-modal pretraining, multi-task learning, and chemical feedback mechanisms, achieving state-of-the-art results.
Contribution
The paper introduces MS-BART, a novel model that unifies spectral and molecular data into a shared space, enabling effective pretraining and transfer learning for structure elucidation.
Findings
Achieves state-of-the-art performance on key metrics.
Faster inference compared to diffusion-based methods.
Robustness to real-world spectral variability.
Abstract
Mass spectrometry (MS) plays a critical role in molecular identification, significantly advancing scientific discovery. However, structure elucidation from MS data remains challenging due to the scarcity of annotated spectra. While large-scale pretraining has proven effective in addressing data scarcity in other domains, applying this paradigm to mass spectrometry is hindered by the complexity and heterogeneity of raw spectral signals. To address this, we propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary, enabling cross-modal learning through large-scale pretraining on reliably computed fingerprint-molecule datasets. Multi-task pretraining objectives further enhance MS-BART's generalization by jointly optimizing denoising and translation task. The pretrained model is subsequently transferred to experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Mass Spectrometry Techniques and Applications · Machine Learning in Materials Science
