Active Learning Enables Extrapolation in Molecular Generative Models
Evan R. Antoniuk, Peggy Li, Nathan Keilbart, Stephen Weitzner, Bhavya, Kailkhura, Anna M. Hiszpanski

TL;DR
This paper introduces an active learning pipeline for molecular generative models that significantly improves their ability to extrapolate properties beyond training data and generate more stable molecules, addressing a key limitation in molecular discovery.
Contribution
It presents a novel active learning framework that enhances the generalization of molecular generative models through iterative feedback from quantum chemical simulations.
Findings
Generated molecules exceed training data property ranges by up to 0.44 standard deviations.
Out-of-distribution molecule classification accuracy improves by 79%.
Proportion of thermodynamically stable molecules is 3.5 times higher than previous models.
Abstract
Although generative models hold promise for discovering molecules with optimized desired properties, they often fail to suggest synthesizable molecules that improve upon the known molecules seen in training. We find that a key limitation is not in the molecule generation process itself, but in the poor generalization capabilities of molecular property predictors. We tackle this challenge by creating an active-learning, closed-loop molecule generation pipeline, whereby molecular generative models are iteratively refined on feedback from quantum chemical simulations to improve generalization to new chemical space. Compared against other generative model approaches, only our active learning approach generates molecules with properties that extrapolate beyond the training data (reaching up to 0.44 standard deviations beyond the training data range) and out-of-distribution molecule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
