How Accurate Are DFT Forces? Unexpectedly Large Uncertainties in Molecular Datasets
Domantas Kuryla, Fabian Berger, G\'abor Cs\'anyi, Angelos Michaelides

TL;DR
This paper investigates the accuracy of DFT forces in molecular datasets used for training machine learning interatomic potentials, revealing large uncertainties and nonzero net forces that impact the reliability of these datasets.
Contribution
It systematically quantifies DFT force component errors across multiple datasets, highlighting the need for well converged DFT data for accurate MLIP training.
Findings
Significant nonzero net forces found in several datasets
Force component errors range from 1.7 to 33.2 meV/Å
Emphasizes importance of well converged DFT data for MLIPs
Abstract
Training of general-purpose machine learning interatomic potentials (MLIPs) relies on large datasets with properties usually computed with density functional theory (DFT). A pre-requisite for accurate MLIPs is that the DFT data are well converged to minimize numerical errors. A possible symptom of errors in DFT force components is nonzero net force. Here, we consider net forces in datasets including SPICE, Transition1x, ANI-1x, ANI-1xbb, AIMNet2, QCML, and OMol25. Several of these datasets suffer from significant nonzero DFT net forces. We also quantify individual force component errors by comparison to recomputed forces using more reliable DFT settings at the same level of theory, and we find significant discrepancies in force components averaging from 1.7 meV/{\AA} in the SPICE dataset to 33.2 meV/{\AA} in the ANI-1x dataset. These findings underscore the importance of well converged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Physical and Chemical Molecular Interactions · Protein Structure and Dynamics
