CoRe-Gen: Robust Spectrum-to-Structure Generation under Imperfect Fingerprint Conditions
Tianbo Liu, Chixiang Lu, Jing Hao, Hengyu Zhang, Lifei Wang, Haibo Jiang, Xiaojuan Qi

TL;DR
CoRe-Gen advances molecular structure prediction from MS/MS spectra by robustly handling noisy fingerprint predictions, achieving state-of-the-art accuracy while maintaining efficiency.
Contribution
It introduces a comprehensive approach combining synthetic-spectrum pretraining, frequency-aware fingerprint corruption, and structure-aware autoregressive decoding to improve robustness.
Findings
Achieves 19.54% Top-1 accuracy on NPLIB1 benchmark.
Outperforms previous methods on standard benchmarks.
Maintains efficiency of autoregressive decoding.
Abstract
Molecular structure elucidation from tandem mass spectra (MS/MS) remains challenging, particularly for de novo generation beyond database coverage. A common approach decomposes the task into spectrum-to-fingerprint prediction followed by fingerprint-to-structure decoding, enabling the use of large-scale molecular corpora. However, at deployment, the decoder relies on predicted rather than oracle fingerprints, introducing structured errors that propagate into generation. This results in a fundamental condition mismatch, where models trained on clean inputs must operate under noisy, biased predictions, especially for long-tail substructures. We present CoRe-Gen that explicitly addresses this gap. CoRe-Gen improves the intermediate condition via synthetic-spectrum pretraining of the encoder, matches deployment-time noise through frequency-aware fingerprint corruption during decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
