Efficient Chemical Space Exploration Using Active Learning Based on Marginalized Graph Kernel: an Application for Predicting the Thermodynamic Properties of Alkanes with Molecular Simulation
Yan Xiang, Yu-Hang Tang, Zheng Gong, Hongyi Liu, Liang Wu, Guang Lin,, Huai Sun

TL;DR
This paper presents an active learning framework combining Gaussian process regression and marginalized graph kernels to efficiently explore chemical space and accurately predict thermodynamic properties of alkanes with minimal data.
Contribution
The study introduces a novel active learning algorithm that effectively reduces data requirements for molecular property prediction using high-throughput simulations and graph neural networks.
Findings
Only 313 molecules needed for high-accuracy GNN models
Achieved R^2 > 0.99 on computational test sets
Demonstrated reliable uncertainty quantification
Abstract
We introduce an explorative active learning (AL) algorithm based on Gaussian process regression and marginalized graph kernel (GPR-MGK) to explore chemical space with minimum cost. Using high-throughput molecular dynamics simulation to generate data and graph neural network (GNN) to predict, we constructed an active learning molecular simulation framework for thermodynamic property prediction. In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties: densities, heat capacities, and vaporization enthalpies, we use the AL algorithm to select the most informative molecules to represent the chemical space. Validation of computational and experimental test sets shows that only 313 (0.124\% of the total) molecules were sufficient to train an accurate GNN model with for computational test sets and …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Various Chemistry Research Topics
MethodsGraph Neural Network · Test · Gaussian Process
