When should we trust the annotation? Selective prediction for molecular structure retrieval from mass spectra
Mira J\"urgens, Gaetan De Waele, Morteza Rakhshaninejad, Willem Waegeman

TL;DR
This paper presents a selective prediction framework for molecular structure retrieval from mass spectra, allowing models to abstain when uncertainty is high, thereby improving reliability in critical applications.
Contribution
It introduces a risk-coverage based selective prediction method with comprehensive uncertainty quantification strategies evaluated on the MassSpecGym benchmark.
Findings
First-order confidence measures and retrieval-level aleatoric uncertainty perform well.
Fingerprint-level uncertainty scores are poor proxies for success.
Distribution-free risk control enables high-confidence annotation subsets.
Abstract
Machine learning methods for identifying molecular structures from tandem mass spectra (MS/MS) have advanced rapidly, yet current approaches still exhibit significant error rates. In high-stakes applications such as clinical metabolomics and environmental screening, incorrect annotations can have serious consequences, making it essential to determine when a prediction can be trusted. We introduce a selective prediction framework for molecular structure retrieval from MS/MS spectra, enabling models to abstain from predictions when uncertainty is too high. We formulate the problem within the risk-coverage tradeoff framework and comprehensively evaluate uncertainty quantification strategies at two levels of granularity: fingerprint-level uncertainty over predicted molecular fingerprint bits, and retrieval-level uncertainty over candidate rankings. We compare scoring functions including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Computational Drug Discovery Methods · Machine Learning in Materials Science
