Selected Machine Learning of HOMO-LUMO gaps with Improved Data-Efficiency
Bernard Mazouin, Alexandre Alain Sch\"opfer, O. Anatole von, Lilienfeld

TL;DR
This paper demonstrates that partitioning training data based on molecular features significantly improves the data-efficiency of quantum machine learning models predicting HOMO-LUMO gaps, reducing the required training set size for accurate predictions.
Contribution
The study introduces a classification-based data partitioning approach that enhances the data-efficiency of QML models for molecular band-gap prediction, outperforming traditional methods.
Findings
Selected QML models achieve ~0.1 eV MAE with fewer training molecules.
Partitioning improves learning rates compared to conventional models.
Selected QML outperforms Δ-QML in data-efficiency.
Abstract
Quantum Machine Learning (QML) models of molecular HOMO-LUMO-gaps often struggle to achieve satisfying data-efficiency as measured by decreasing prediction errors for increasing training set sizes. Partitioning training sets of organic molecules (QM7 and QM9-data-sets) into three classes [systems containing either aromatic rings and carbonyl groups, or single unsaturated bonds, or saturated bonds] prior to training results in independently trained QML models with improved learning rates. The selected QML models of band-gaps (at GW, B3LYP, and ZINDO level of theory) reach mean absolute prediction errors of 0.1 eV for up to an order of magnitude fewer training molecules than for conventionally trained models. Direct comparison to -QML models of band-gaps suggest that selected QML possesses substantially more data-efficiency. The findings suggest that selected QML, e.g. based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Various Chemistry Research Topics
