NMRGym: A Comprehensive Benchmark for Nuclear Magnetic Resonance Based Molecular Structure Elucidation
Zheng Fang, Chen Yang, Hai-tao Yu, Haoming Luo, Haitao He, Jiaqing Xie, Zhuo Yang, Jun Xia

TL;DR
NMRGym provides the largest standardized experimental NMR dataset and benchmark suite, enabling fair evaluation and fostering progress in machine learning applications for molecular structure elucidation and related tasks.
Contribution
This work introduces NMRGym, a comprehensive, high-quality experimental NMR dataset with standardized protocols, a benchmark suite, and an open leaderboard to advance research in NMR-based molecular analysis.
Findings
Established a large, high-quality experimental NMR dataset with 269,999 molecules.
Developed a standardized evaluation framework with data splitting and annotations.
Benchmarking of state-of-the-art methods across multiple NMR-related tasks.
Abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is the cornerstone of small-molecule structure elucidation. While deep learning has demonstrated significant potential in automating structure elucidation and spectral simulation, current progress is severely impeded by the reliance on synthetic datasets, which introduces significant domain shifts when applied to real-world experimental spectra. Furthermore, the lack of standardized evaluation protocols and rigorous data splitting strategies frequently leads to unfair comparisons and data leakage. To address these challenges, we introduce \textbf{NMRGym}, the largest and most comprehensive standardized dataset and benchmark derived from high-quality experimental NMR data to date. Comprising \textbf{269,999} unique molecules paired with high-fidelity H and C spectra, NMRGym bridges the critical gap between synthetic approximations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Molecular spectroscopy and chirality
