ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules
Justin S. Smith, Olexandr Isayev, and Adrian E. Roitberg

TL;DR
This paper introduces ANI-1, a comprehensive dataset of 20 million DFT calculations for small organic molecules, aiming to support the development and benchmarking of machine learning potentials in chemistry.
Contribution
The authors provide a large, publicly available DFT dataset of 20 million conformations, facilitating the training and evaluation of ML models in computational chemistry.
Findings
Dataset covers 57,454 small organic molecules.
Aims to become a standard benchmark for ML potential development.
Supports rapid and accurate ab initio approximations.
Abstract
One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML), in particular neural networks, are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of 20M conformations for 57,454 small organic molecules. We believe it will become a new standard benchmark for comparison of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
