Training Data Set Refinement for the Machine Learning Potential of Li-Si Alloys via Structural Similarity Analysis
Nan Xu, Chen Li, Mandi Fang, Qing Shao, Yingying Lu, Yao Shi, and Yi, He

TL;DR
This study demonstrates that a reduced, structurally diverse training data set of 400 configurations can produce a Li-Si machine learning potential with accuracy comparable to a much larger set, reducing computational costs.
Contribution
The paper introduces a structural similarity analysis combined with farthest point sampling to efficiently reduce training data redundancy for Li-Si machine learning potentials.
Findings
A training set of 400 configurations matches the accuracy of 6183 configurations.
The redundancy reduction method outperforms stochastic sampling.
Reduced data set maintains high accuracy in energy, force, and structural predictions.
Abstract
Machine learning potential enables molecular dynamics simulations of systems beyond the capability of classical force fields. The traditional approach to develop structural sets for training machine learning potential typically generate a great number of redundant configurations, which will result in unnecessary computational costs. This work investigates the possibility of reducing redundancy in an initial data set containing 6183 configurations for a Li-Si machine learning potential. Starting from the initial data set, we constructed a series of subsets ranging from 25 to 1500 configurations by combining a structural similarity analysis algorithm and the farthest point sampling method. Results show that the machine learning potential trained from a data set containing 400 configurations can achieve an accuracy comparable to the one developed from the initial data set of 6183…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Nuclear Materials and Properties · Hydrogen embrittlement and corrosion behaviors in metals
