Active Learning Strategies for Efficient Machine-Learned Interatomic Potentials Across Diverse Material Systems
Mohammed Azeez Khan, Aaron D'Souza, Vijay Choyal

TL;DR
This paper introduces an active learning framework for training machine-learned interatomic potentials efficiently across various materials, reducing the need for costly calculations and providing practical guidelines for data-efficient model development.
Contribution
It develops a novel active learning approach using diversity sampling and uncertainty quantification, demonstrating improved efficiency over baseline methods across multiple material systems.
Findings
Diversity sampling outperforms random sampling in accuracy and efficiency.
The approach reduces labeled data requirements by 5-13%.
The pipeline is fast, resource-efficient, and accessible for resource-limited researchers.
Abstract
Efficient materials discovery requires reducing costly first-principles calculations for training machine-learned interatomic potentials (MLIPs). We develop an active learning (AL) framework that iteratively selects informative structures from the Materials Project and Open Quantum Materials Database (OQMD) using compositional and property-based descriptors with a neural network ensemble model. Query-by-Committee enables real-time uncertainty quantification. We compare four strategies: random sampling (baseline), uncertainty-based sampling, diversity-based sampling (k-means clustering with farthest-point refinement), and a hybrid approach. Experiments across four material systems (C, Si, Fe, and TiO2) with 5 random seeds demonstrate that diversity sampling achieves competitive or superior performance, with 10.9% improvement on TiO2. Our approach achieves equivalent accuracy with 5-13%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Inorganic Chemistry and Materials
