Data-Efficient Machine Learning Potentials via Difference Vectors Based on Local Atomic Environments
Xuqiang Shao, Yuqi Zhang, Di Zhang, Zhaoyan Dong, Tianxiang Gao, Mingzhe Li, Xinyuan Liu, Zhiran Gan, Fanshun Meng, Lingcai Kong, Zhengyang Gao, Hao Lic, Weijie Yangd

TL;DR
This paper introduces DV-LAE, a method that uses difference vectors based on local atomic environments to optimize datasets for machine learning potentials, reducing redundancy and training time while maintaining accuracy.
Contribution
The paper presents a novel histogram-based descriptor method, DV-LAE, for dataset optimization and out-of-distribution detection in atomistic machine learning potentials.
Findings
Reduces dataset size by up to 56% while maintaining accuracy.
Cuts training time per iteration by over 50%.
Enables visualization for out-of-distribution detection.
Abstract
Constructing efficient and diverse datasets is essential for the development of accurate machine learning potentials (MLPs) in atomistic simulations. However, existing approaches often suffer from data redundancy and high computational costs. Herein, we propose a new method--Difference Vectors based on Local Atomic Environments (DV-LAE)--that encodes structural differences via histogram-based descriptors and enables visual analysis through t-SNE dimensionality reduction. This approach facilitates redundancy detection and dataset optimization while preserving structural diversity. We demonstrate that DV-LAE significantly reduces dataset size and training time across various materials systems, including high-pressure hydrogen, iron-hydrogen binaries, magnesium hydrides, and carbon allotropes, with minimal compromise in prediction accuracy. For instance, in the -Fe/H system,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
