Reducing Training Data Needs with Minimal Multilevel Machine Learning (M3L)
Stefan Heinen, Danish Khan, Guido Falk von Rudorff and, Konstantin Karandashev, Daniel Jose Arismendi Arrieta, Alastair J. A., Price, Surajit Nandi, Arghya Bhowmik, Kersti Hermansson, O., Anatole von Lilienfeld

TL;DR
This paper introduces M3L, a multilevel machine learning approach that optimizes training data sizes to significantly reduce computational costs in quantum chemistry predictions, achieving near-chemical accuracy efficiently.
Contribution
The paper presents a novel M3L method that adaptively minimizes training data and computational costs across multiple reference levels, outperforming heuristic approaches in quantum chemistry tasks.
Findings
M3L reduces computational costs by factors up to 25.8 compared to heuristic methods.
M3L achieves chemical accuracy with substantially less training data.
Analysis of density functionals shows top GGA and hybrid levels for atomization energies.
Abstract
For many machine learning applications in science, data acquisition, not training, is the bottleneck even when avoiding experiments and relying on computation and simulation. Correspondingly, and in order to reduce cost and carbon footprint, training data efficiency is key. We introduce minimal multilevel machine learning (M3L) which optimizes training data set sizes using a loss function at multiple levels of reference data in order to minimize a combination of prediction error with overall training data acquisition costs (as measured by computational wall-times). Numerical evidence has been obtained for calculated atomization energies and electron affinities of thousands of organic molecules at various levels of theory including HF, MP2, DLPNO-CCSD(T), DFHFCABS, PNOMP2F12, and PNOCCSD(T)F12, and treating tens with basis sets TZ, cc-pVTZ, and AVTZ-F12. Our M3L benchmarks for reaching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Chemical Physics Studies
