Improved accuracy and transferability of molecular-orbital-based machine learning: Organics, transition-metal complexes, non-covalent interactions, and transition states
Tamara Husch, Jiace Sun, Lixue Cheng, Sebastian J. R. Lee, and Thomas, F. Miller III

TL;DR
This paper enhances molecular-orbital-based machine learning (MOB-ML) for predicting molecular energies, emphasizing physical constraints, and demonstrates its high accuracy and transferability across diverse chemical datasets, including transition states and non-covalent interactions.
Contribution
The study introduces physically constrained MOB-ML models that require minimal training data and achieve high accuracy across various chemical systems and extrapolation tasks.
Findings
MOB-ML achieves sub-kcal/mol accuracy with only 1% training data.
The method maintains accuracy when transferred to larger molecules and different datasets.
Active learning with Gaussian process variance effectively extends MOB-ML to new chemical regions.
Abstract
Molecular-orbital-based machine learning (MOB-ML) provides a general framework for the prediction of accurate correlation energies at the cost of obtaining molecular orbitals. We demonstrate the importance of preserving physical constraints, including invariance conditions and size consistency, when generating the input for the machine learning model. Numerical improvements are demonstrated for different data sets covering total and relative energies for thermally accessible organic and transition-metal containing molecules, non-covalent interactions, and transition-state energies. MOB-ML requires training data from only 1% of the QM7b-T data set (i.e., only 70 organic molecules with seven and fewer heavy atoms) to predict the total energy of the remaining 99% of this data set with sub-kcal/mol accuracy. This MOB-ML model is significantly more accurate than other methods when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
