Dataset-aware entropy-maximized active learning for machine-learned interatomic potentials
Meiyan Wang, Rishi Rao, and Li Zhu

TL;DR
This paper introduces an entropy-maximized active learning framework that efficiently generates high-quality training data for machine-learned interatomic potentials across diverse materials and phases.
Contribution
The method combines local entropy-driven molecular dynamics with global dataset-aware filtering, enabling broad configuration space coverage and improved learning efficiency.
Findings
Achieves 3-10 times lower energy MAE than random sampling at similar training sizes.
Effective across systems with covalent, metallic, and ionic bonding.
Requires significantly fewer configurations to reach high accuracy.
Abstract
We present an active learning framework for efficiently generating training data for machine-learned interatomic potentials (MLIPs). The method combines local entropy-driven molecular dynamics with global dataset-aware filtering: a per-configuration entropy term biases MD trajectories toward structurally diverse snapshots, while a global entropy measure, the log-determinant of the fingerprint covariance matrix of the entire dataset, selects only those configurations that provide genuinely new information. We employ dual covariance modes (per-atom for disordered structures and per-config for ordered phases) to achieve broad coverage of configuration space. Combined with a pre-trained foundation model (Allegro-OAM-L) and analytical fingerprint gradients from Gaussian overlap matrix eigenvalues, the framework produces high-quality domain-specific potentials with near- or sub-meV/atom…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
