Contrastive Domain Generalization for Cross-Instrument Molecular Identification in Mass Spectrometry
Seunghyun Yoo, Sanghong Kim, Namkyung Yoon, Hwangnam Kim

TL;DR
This paper introduces a novel contrastive domain generalization framework that maps mass spectrometry data into a chemically meaningful embedding space, significantly improving zero-shot molecular identification and generalization across unseen molecular structures.
Contribution
It proposes a cross-modal alignment approach that directly embeds MS spectra into a pretrained chemical language model's structure space, enhancing generalization to unseen molecules.
Findings
Achieves 42.2% Top-1 accuracy in zero-shot retrieval on a strict scaffold-disjoint benchmark.
Demonstrates 95.4% accuracy in 5-way 5-shot molecular re-identification.
Shows strong chemical coherence in the learned embedding space.
Abstract
Identifying molecules from mass spectrometry (MS) data remains a fundamental challenge due to the semantic gap between physical spectral peaks and underlying chemical structures. Existing deep learning approaches often treat spectral matching as a closed-set recognition task, limiting their ability to generalize to unseen molecular scaffolds. To overcome this limitation, we propose a cross-modal alignment framework that directly maps mass spectra into the chemically meaningful molecular structure embedding space of a pretrained chemical language model. On a strict scaffold-disjoint benchmark, our model achieves a Top-1 accuracy of 42.2% in fixed 256-way zero-shot retrieval and demonstrates strong generalization under a global retrieval setting. Moreover, the learned embedding space demonstrates strong chemical coherence, reaching 95.4% accuracy in 5-way 5-shot molecular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMass Spectrometry Techniques and Applications · Computational Drug Discovery Methods · Machine Learning in Materials Science
