A Benchmark for Quantum Chemistry Relaxations via Machine Learning Interatomic Potentials
Cong Fu, Yuchao Lin, Zachary Krueger, Wendi Yu, Xiaoning Qian, Byung-Jun Yoon, Raymundo Arr\'oyave, Xiaofeng Qian, Toshiyuki Maeda, Maho Nakata, Shuiwang Ji

TL;DR
This paper introduces PubChemQCR, a large-scale dataset of DFT-based molecular relaxation trajectories for small organic molecules, enabling the development and benchmarking of machine learning interatomic potentials for quantum chemistry applications.
Contribution
The paper provides the largest publicly available dataset of DFT relaxation trajectories, including energy and force labels, and benchmarks multiple MLIP models for quantum chemistry simulations.
Findings
PubChemQCR contains approximately 3.5 million trajectories and 300 million conformations.
Benchmark results highlight the performance of nine MLIP models on the dataset.
The dataset facilitates the development of transferable MLIPs for molecular relaxation tasks.
Abstract
Computational quantum chemistry plays a critical role in drug discovery, chemical synthesis, and materials science. While first-principles methods, such as density functional theory (DFT), provide high accuracy in modeling electronic structures and predicting molecular properties, they are computationally expensive. Machine learning interatomic potentials (MLIPs) have emerged as promising surrogate models that aim to achieve DFT-level accuracy while enabling efficient large-scale atomistic simulations. The development of accurate and transferable MLIPs requires large-scale, high-quality datasets with both energy and force labels. Critically, MLIPs must generalize not only to stable geometries but also to intermediate, non-equilibrium conformations encountered during atomistic simulations. In this work, we introduce PubChemQCR, a large-scale dataset of molecular relaxation trajectories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
