When More Data Hurts: Optimizing Data Coverage While Mitigating Diversity Induced Underfitting in an Ultra-Fast Machine-Learned Potential
Jason B. Gibson, Tesia D. Janicki, Ajinkya C. Hire, Chris Bishop, J., Matthew D. Lane, Richard G. Hennig

TL;DR
This paper explores how the diversity of training data impacts the performance of machine-learned interatomic potentials, revealing a critical balance needed for optimal accuracy in materials modeling.
Contribution
It demonstrates the importance of application-specific training data and identifies the optimal diversity level for accurate MLIP performance in modeling amorphous silicon nitride.
Findings
Balanced training data diversity improves MLIP generalization.
Removing nitrogen-rich structures enhances prediction accuracy.
Excessive diversity reduces simulation precision.
Abstract
Machine-learned interatomic potentials (MLIPs) are becoming an essential tool in materials modeling. However, optimizing the generation of training data used to parameterize the MLIPs remains a significant challenge. This is because MLIPs can fail when encountering local enviroments too different from those present in the training data. The difficulty of determining \textit{a priori} the environments that will be encountered during molecular dynamics (MD) simulation necessitates diverse, high-quality training data. This study investigates how training data diversity affects the performance of MLIPs using the Ultra-Fast Force Field (UF) to model amorphous silicon nitride. We employ expert and autonomously generated data to create the training data and fit four force-field variants to subsets of the data. Our findings reveal a critical balance in training data diversity: insufficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods
