Random sampling versus active learning algorithms for machine learning potentials of quantum liquid water
Nore Stolte, J\'anos Daru, Harald Forbert, Dominik Marx, J\"org Behler

TL;DR
This study compares random sampling and active learning for training neural network potentials of quantum liquid water, finding that random sampling often yields smaller test errors and that the training process is robust to the data selection method.
Contribution
It demonstrates that random sampling can outperform active learning in certain contexts for training machine learning potentials of quantum water.
Findings
Random sampling leads to smaller test errors than active learning at the same data set size.
All trained models accurately reproduce structural properties of quantum liquid water.
A small initial data set is crucial for effective active learning to avoid exploring irrelevant configurations.
Abstract
Training accurate machine learning potentials requires electronic structure data comprehensively covering the configurational space of the system of interest. As the construction of this data is computationally demanding, many schemes for identifying the most important structures have been proposed. Here, we compare the performance of high-dimensional neural network potentials (HDNNPs) for quantum liquid water at ambient conditions trained to data sets constructed using random sampling as well as various flavors of active learning based on query by committee. Contrary to the common understanding of active learning, we find that for a given data set size, random sampling leads to smaller test errors for structures not included in the training process. In our analysis we show that this can be related to small energy offsets caused by a bias in structures added in active learning, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Quantum Chemical Studies · Machine Learning in Materials Science
