Optimal design of experiments in the context of machine-learning inter-atomic potentials: improving the efficiency and transferability of kernel based methods
Bartosz Barzdajn, Christopher P. Race

TL;DR
This paper explores how classical statistical design of experiments can optimize training data selection for machine learning inter-atomic potentials, enhancing their accuracy, transferability, and efficiency without requiring complex computational frameworks.
Contribution
It demonstrates that optimal experimental design methods can effectively improve training data selection for kernel-based ML potentials, reducing computational costs and increasing transferability.
Findings
Optimal design improves model accuracy with fewer data points
Off-line assessment of data informativeness is feasible
Classical statistical methods mitigate sampling bias in training data
Abstract
Data-driven, machine learning (ML) models of atomistic interactions are often based on flexible and non-physical functions that can relate nuanced aspects of atomic arrangements into predictions of energies and forces. As a result, these potentials are as good as the training data (usually results of so-called ab initio simulations) and we need to make sure that we have enough information for a model to become sufficiently accurate, reliable and transferable. The main challenge stems from the fact that descriptors of chemical environments are often sparse high-dimensional objects without a well-defined continuous metric. Therefore, it is rather unlikely that any ad hoc method of choosing training examples will be indiscriminate, and it will be easy to fall into the trap of confirmation bias, where the same narrow and biased sampling is used to generate train- and test- sets. We will…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Electron and X-Ray Spectroscopy Techniques · Nuclear Physics and Applications
MethodsSparse Evolutionary Training · High-Order Consensuses
