Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra
Yiwen Zhang, Keyan Ding, Yihang Wu, Xiang Zhuang, Yi Yang, Qiang Zhang, Huajun Chen

TL;DR
This paper introduces GLMR, a generative modeling framework that improves molecule retrieval from mass spectra by addressing modality misalignment and leveraging a two-stage process for higher accuracy and better generalization.
Contribution
The paper presents a novel two-stage generative retrieval framework, GLMR, that enhances molecular structure prediction from mass spectra and outperforms existing methods.
Findings
Over 40% improvement in top-1 accuracy on benchmark datasets
Effective mitigation of modality misalignment issues
Strong generalization demonstrated across datasets
Abstract
Retrieving molecular structures from tandem mass spectra is a crucial step in rapid compound identification. Existing retrieval methods, such as traditional mass spectral library matching, suffer from limited spectral library coverage, while recent cross-modal representation learning frameworks often encounter modality misalignment, resulting in suboptimal retrieval accuracy and generalization. To address these limitations, we propose GLMR, a Generative Language Model-based Retrieval framework that mitigates the cross-modal misalignment through a two-stage process. In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum. In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
