ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation
Gregory W. Kyro, Anton Morgunov, Rafael I. Brent, Victor S. Batista

TL;DR
ChemSpaceAL introduces an efficient active learning approach for protein-specific molecular generation, enabling targeted drug discovery with minimal data evaluation and demonstrating success in generating molecules for proteins with and without known inhibitors.
Contribution
The paper presents a novel, computationally efficient active learning methodology for targeted molecular generation, applicable to proteins with or without known inhibitors.
Findings
Successfully fine-tuned a GPT-based generator for c-Abl kinase inhibitors.
Generated molecules similar to known inhibitors without prior knowledge.
Effective for proteins lacking existing small-molecule inhibitors.
Abstract
The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. Within this domain, the vastness of chemical space motivates the development of more efficient methods for identifying regions with molecules that exhibit desired characteristics. In this work, we present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data in the constructed sample space to successfully align a generative model with respect to a specified objective. We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase. Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Microfluidic and Catalytic Techniques Innovation · CRISPR and Genetic Engineering · Computational Drug Discovery Methods
MethodsALIGN
