MSAlign: Aligning Molecule and Mass Spectra Foundation Models for Metabolite Identification
Paul Krzakala, Gabriel Melo, Camille Lan\c{c}on, Charlotte Laclau, R\'emi Flamary, Etienne Th\'evenot, Florence d'Alch\'e-Buc

TL;DR
MSAlign introduces a unified, contrastive learning framework that aligns molecule and mass spectrometry foundation models to improve metabolite identification accuracy across benchmarks.
Contribution
It proposes MSAlign, a simple, fast, and effective multimodal alignment method using frozen models and contrastive learning, and evaluates data splitting strategies in molecule retrieval.
Findings
MSAlign outperforms existing methods on all benchmarks.
The framework is simple to implement and fast to train.
A new measure of distribution shift is introduced for evaluation.
Abstract
Accurately identifying metabolites i.e. small molecules from mass spectrometry data remains a core challenge in metabolomics, with broad applications in drug discovery, environmental analysis, and clinical research. We address the Molecule Retrieval task, which consists in recovering the chemical structure of a metabolite from its MS/MS spectrum given a set of candidate molecules. While the recent release of benchmark datasets such as MassSpecGym and Spectraverse has considerably accelerated the development of novel machine learning approaches, the complexity of data preprocessing pipelines and the lack of unified implementations make methods and results difficult to reproduce and compare. We make three contributions. First, we propose a unified framework encompassing recent approaches based on representation alignment and contrastive learning. Second, we introduce MSAlign, inspired by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
