Optimal data generation for machine learned interatomic potentials
Connor Allen, Albert P. Bart\'ok

TL;DR
This paper introduces an efficient, automated method using non-diagonal supercells to generate optimal atomic configuration databases for training machine learning interatomic potentials, reducing effort and improving accuracy.
Contribution
The authors present a novel, automated protocol employing non-diagonal supercells for optimal data generation in MLIP training, applicable across various materials.
Findings
MLIPs accurately reproduce phonon and elastic properties of Al, W, Mg, and Si.
The protocol significantly reduces data generation effort.
Method is adaptable to different materials and MLIP workflows.
Abstract
Machine learning interatomic potentials (MLIPs) are routinely used atomic simulations, but generating databases of atomic configurations used in fitting these models is a laborious process, requiring significant computational and human effort. A computationally efficient method is presented to generate databases of atomic configurations that contain optimal information on the small-displacement regime of the potential energy surface of bulk crystalline matter. Utilising non-diagonal supercell (NDSC), an automatic process is suggested for ab initio data generation. MLIPs were fitted for Al, W, Mg and Si, which very closely reproduce the ab initio phonon and elastic properties. The protocol can be easily adapted to other materials and can be inserted in the workflow of any flavour of MLIP generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Electron and X-Ray Spectroscopy Techniques · X-ray Diffraction in Crystallography
