TL;DR
This paper introduces an active learning approach using regression-tree ensembles to efficiently model potential energy surfaces with minimal electronic structure calculations, achieving high accuracy with fewer data points.
Contribution
The authors propose a novel regression-tree ensemble method for active learning of PESs that reduces data requirements and computational costs without prior knowledge of the PES.
Findings
Requires about half the data compared to previous methods.
Achieves a generalization error of 16 cm⁻¹ with fewer than 15,000 configurations.
Final model with 50,000 configurations reaches 11 cm⁻¹ error.
Abstract
Several pool-based active learning algorithms (AL) were employed to model potential energy surfaces (PESs) with a minimum number of electronic structure calculations. Theoretical and empirical results suggest that superior strategies can be obtained by sampling molecular structures corresponding to large uncertainties in their predictions while at the same time not deviating much from the true distribution of the data. To model PESs in an AL framework we propose to use a regression version of stochastic query by forest, a hybrid method that samples points corresponding to large uncertainties while avoiding collecting too many points from sparse regions of space. The algorithm is implemented with decision trees that come with relatively small computational costs. We empirically show that this algorithm requires around half the data to converge to the same accuracy in comparison to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
