TL;DR
This review discusses the landscape of quantum chemical data sets and databases crucial for training machine learning potentials in computational chemistry, highlighting their characteristics, challenges, and future needs.
Contribution
It provides a comprehensive overview of existing quantum chemical data resources and emphasizes the importance of standardization, accessibility, and sustainability for future development.
Findings
Key data sets vary in chemical diversity and electronic structure methods used.
Challenges include data growth, standardization, and long-term accessibility.
Recommendations focus on developing sustainable, interoperable, and user-friendly data platforms.
Abstract
The field of computational chemistry is increasingly leveraging machine learning (ML) potentials to predict molecular properties with high accuracy and efficiency, providing a viable alternative to traditional quantum mechanical (QM) methods, which are often computationally intensive. Central to the success of ML models is the quality and comprehensiveness of the data sets on which they are trained. Quantum chemistry data sets and databases, comprising extensive information on molecular structures, energies, forces, and other properties derived from QM calculations, are crucial for developing robust and generalizable ML potentials. In this review, we provide an overview of the current landscape of quantum chemical data sets and databases. We examine key characteristics and functionalities of prominent resources, including the types of information they store, the level of electronic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
