NMRPeak: a ready-to-use intelligent system for molecular structure elucidation enabled by synergistic cross-modal learning
Fanjie Xu, Jinyuan Hu, Jingxiang Zou, Junjie Wang, Boying Huang, Zhifeng Gao, Xiaohong Ji, Weinan E, Zhong-Qun Tian, Fujie Tang, Jun Cheng

TL;DR
NMRPeak is an integrated machine learning system that advances molecular structure elucidation from NMR spectra by combining prediction, retrieval, and generation tasks with experimental data, achieving high accuracy and bridging the simulation-experiment gap.
Contribution
It introduces a unified, experimentally grounded cross-modal learning framework with novel tokenization and similarity metrics, enabling accurate, automated molecular structure elucidation.
Findings
Achieves over 95% top-1 accuracy in molecular retrieval.
Attains approximately 75% top-1 accuracy in stereochemistry-aware structure generation.
Constructs the largest NMR spectrum benchmark with 1.8 million spectra.
Abstract
One-dimensional nuclear magnetic resonance (NMR) spectroscopy is essential for molecular structure elucidation in organic synthesis, drug discovery, natural product characterization, and metabolomics, yet its interpretation remains heavily dependent on expert knowledge and difficult to scale. Although machine learning has been applied to NMR spectrum prediction, library retrieval, and structure generation, these tasks have evolved in isolation using simulated data and incompatible spectral representations, limiting their utility under real experimental scenarios. Here we present NMRPeak, a unified cross-modal learning system that integrates these three tasks through experimentally grounded design. We curate approximately 1.8 million experimental and simulated spectra to construct the largest benchmark for NMR-based structure elucidation and systematically quantify the distribution shift…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
