Generating $\pi$-Functional Molecules Using STGG+ with Active Learning
Alexia Jolicoeur-Martineau, Yan Zhang, Boris Knyazev, Aristide, Baratin, Cheng-Hao Liu

TL;DR
This paper introduces STGG+AL, an active learning framework that enhances the generation of novel, out-of-distribution $ ext{π}$-functional molecules with high oscillator strength, outperforming reinforcement learning methods in molecular design.
Contribution
The paper presents a novel active learning approach integrating STGG+ for designing $ ext{π}$-functional molecules, enabling better exploration of chemical space and out-of-distribution properties.
Findings
Successfully generated molecules with high oscillator strength and NIR absorption.
Outperformed existing reinforcement learning methods in molecular generation tasks.
Provided open-source datasets and tools for future research.
Abstract
Generating novel molecules with out-of-distribution properties is a major challenge in molecular discovery. While supervised learning methods generate high-quality molecules similar to those in a dataset, they struggle to generalize to out-of-distribution properties. Reinforcement learning can explore new chemical spaces but often conducts 'reward-hacking' and generates non-synthesizable molecules. In this work, we address this problem by integrating a state-of-the-art supervised learning method, STGG+, in an active learning loop. Our approach iteratively generates, evaluates, and fine-tunes STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We apply STGG+AL to the design of organic -functional materials, specifically two challenging tasks: 1) generating highly absorptive molecules characterized by high oscillator strength and 2) designing absorptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research · Fractal and DNA sequence analysis · Computability, Logic, AI Algorithms
