Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets
Ulrich A. Mbou Sob, Qiulin Li, Miguel Arbes\'u, Oliver Bent, Andries, P. Smit, Arnu Pretorius

TL;DR
This paper introduces a novel transformer-based generative model for small molecules using a modified SAFE representation, achieving high validity and improved target-specific molecule generation through reinforcement learning fine-tuning.
Contribution
The authors develop a new latent-variable transformer model with a modified SAFE representation and demonstrate its effectiveness in generating valid molecules and optimizing for protein targets via RL fine-tuning.
Findings
Validity rate > 90% for generated molecules
Nearly doubling hit candidates after RL fine-tuning for some targets
Top 5% docking scores comparable or superior to SOTA
Abstract
A specific challenge with deep learning approaches for molecule generation is generating both syntactically valid and chemically plausible molecular string representations. To address this, we propose a novel generative latent-variable transformer model for small molecules that leverages a recently proposed molecular string representation called SAFE. We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training and use this to train our model. Our experiments show that our model can generate novel molecules with a validity rate > 90% and a fragmentation rate < 1% by sampling from a latent space. By fine-tuning the model using reinforcement learning to improve molecular docking, we significantly increase the number of hit candidates for five specific protein targets compared to the pre-trained model, nearly doubling this number for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Genetics, Bioinformatics, and Biomedical Research
MethodsFragmentation
