The MD17 Datasets from the Perspective of Datasets for Gas-Phase "Small" Molecule Potentials
Joel M. Bowman, Chen Qu Riccardo Conte, Apurba Nandi, Paul L. Houston,, and Qi Yu

TL;DR
This paper critically evaluates the MD17 datasets for small molecules in the context of potential energy surface modeling, highlighting their limitations and introducing a new, more comprehensive database called QM-22.
Contribution
The authors compare existing MD17 datasets with targeted datasets for specific molecular properties and present QM-22, a new extensive dataset for small molecules.
Findings
MD17 datasets are limited for zero-point energy and tunneling calculations.
Existing datasets do not adequately cover high-energy configurations.
QM-22 offers broader coverage for small molecule PES modeling.
Abstract
There has been great progress in developing methods for machine-learned potential energy surfaces. There have also been important assessments of these methods by comparing so-called learning curves on datasets of electronic energies and forces, notably the MD17 database. The dataset for each molecule in this database generally consists of tens of thousands of energies and forces obtained from DFT direct dynamics at 500 K. We contrast the datasets from this database for three "small" molecules, ethanol, malonaldehyde, and glycine, with datasets we have generated with specific targets for the PESs in mind: a rigorous calculation of the zero-point energy and wavefunction, the tunneling splitting in malonaldehyde and in the case of glycine a description of all eight low-lying conformers. We found that the MD17 datasets are too limited for these targets. We also examine recent datasets for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
