Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space
Lixue Cheng, Jiace Sun, Thomas F. Miller III

TL;DR
This paper presents an unsupervised Gaussian mixture model clustering approach combined with Gaussian process regression to enhance the accuracy and efficiency of molecular energy predictions in MOB-ML, outperforming previous methods.
Contribution
Introduces an automatic, unsupervised GMM clustering method for MOB-ML that improves training efficiency and accuracy without requiring user parameters or additional classifiers.
Findings
GMM/GPR achieves the best accuracy among tested methods.
GMM/GPR significantly speeds up training time.
The approach improves transferability and prediction quality.
Abstract
We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Various Chemistry Research Topics
MethodsGaussian Process · Linear Regression
