Improving Molecular Force Fields Across Configurational Space by Combining Supervised and Unsupervised Machine Learning
Gregory Fonseca, Igor Poltavsky, Valentin Vassilev-Galindo, Alexandre, Tkatchenko

TL;DR
This paper presents a combined supervised and unsupervised machine learning approach to improve the accuracy and applicability of molecular force fields across diverse configurational spaces, especially for less common configurations.
Contribution
The authors introduce a clustering-based iterative training method that reduces bias in reference datasets, enhancing MLFF performance across broader configurational spaces.
Findings
Up to two-fold reduction in force prediction errors.
Effective for kernel-based and neural network models.
Improves both energy and force predictions simultaneously.
Abstract
The training set of atomic configurations is key to the performance of any Machine Learning Force Field (MLFF) and, as such, the training set selection determines the applicability of the MLFF model for predictive molecular simulations. However, most atomistic reference datasets are inhomogeneously distributed across configurational space (CS), thus choosing the training set randomly or according to the probability distribution of the data leads to models whose accuracy is mainly defined by the most common close-to-equilibrium configurations in the reference data. In this work, we combine unsupervised and supervised ML methods to bypass the inherent bias of the data for common configurations, effectively widening the applicability range of MLFF to the fullest capabilities of the dataset. To achieve this goal, we first cluster the CS into subregions similar in terms of geometry and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
