NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers
Liujia Yang, Zhuo Yang, Jiaqing Xie, Yubin Wang, Ben Gao, Tianfan Fu, Xingjian Wei, Jiaxing Sun, Jiang Wu, Conghui He, Yuqiang Li, Qinying Gu

TL;DR
NMRTrans is a novel transformer-based model trained exclusively on large-scale experimental NMR spectra, significantly improving molecular structure elucidation accuracy without relying on computed spectra.
Contribution
It introduces NMRSpec, a large experimental NMR spectra corpus, and NMRTrans, a set transformer model that aligns with the physical nature of NMR spectra, achieving state-of-the-art results.
Findings
Achieves +17.82% Top-10 accuracy over baseline
Trained solely on experimental spectra, not computed data
Highlights importance of structure-aware architectures
Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is fundamental for molecular structure elucidation, yet interpreting spectra at scale remains time-consuming and highly expertise-dependent. While recent spectrum-as-language modeling and retrieval-based methods have shown promise, they rely heavily on large corpora of computed spectra and exhibit notable performance drops when applied to experimental measurements. To address these issues, we build NMRSpec, a large-scale corpus of experimental H and C spectra mined from chemical literature, and propose NMRTrans, which models spectra as unordered peak sets and aligns the model's inductive bias with the physical nature of NMR. To our best knowledge, NMRTrans is the first NMR Transformer trained solely on large-scale experimental spectra and achieves state-of-the-art performance on experimental benchmarks, improving Top-10 Accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Molecular spectroscopy and chirality
