High-quality, high-information datasets for universal atomistic machine learning
Cesare Malosso, Filippo Bigi, Paolo Pegolo, Joseph W. Abbott, Philip Loche, Mariana Rossi, Michele Ceriotti, Arslan Mazitov

TL;DR
This paper introduces MAD-1.5, a high-quality, consistent, and comprehensive dataset for training universal atomistic machine learning models across the periodic table, enabling high-accuracy simulations.
Contribution
The creation of MAD-1.5, a curated, standardized dataset covering 102 elements with high-level DFT calculations, designed specifically for broad atomistic modeling.
Findings
PET-MAD-1.5 achieves benchmark-level accuracy.
Dataset improves chemical space coverage and consistency.
Outlier removal enhances data quality.
Abstract
The quality, consistency, and information content of training data is often what determines the practical value of machine-learning models for atomistic simulations. Yet, many widely used electronic-structure databases are assembled having materials screening as primary goal rather than robust force-field learning, are limited in their scope to a specific class of chemical compounds, and/or employ inconsistent DFT functionals and settings. Here we introduce MAD-1.5, a highly curated dataset designed explicitly for training broadly applicable atomistic models across the periodic table at high levels of theory. MAD-1.5 extends the MAD dataset with targeted enrichment strategies that improve the coverage of chemical space to 102 elements while keeping the total number of configurations compact. All structures are computed with a single, standardized all-electron DFT workflow using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Chemical Physics Studies · Crystallography and molecular interactions
