Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks
Xinyu Wang, Jinbo Bi, Minghu Song

TL;DR
This paper introduces PSV-PPO, a reinforcement learning algorithm that uses real-time partial SMILES validation to improve molecule validity and exploration in drug design, addressing catastrophic forgetting in LLM-based molecule generation.
Contribution
The paper presents PSV-PPO, a novel RL method that performs stepwise partial SMILES validation to prevent forgetting and enhance exploration during molecule generation.
Findings
Reduces invalid molecule generation significantly.
Maintains high validity rates during exploration.
Performs well on benchmark datasets.
Abstract
SMILES-based molecule generation has emerged as a powerful approach in drug discovery. Deep reinforcement learning (RL) using large language model (LLM) has been incorporated into the molecule generation process to achieve high matching score in term of likelihood of desired molecule candidates. However, a critical challenge in this approach is catastrophic forgetting during the RL phase, where knowledge such as molecule validity, which often exceeds 99\% during pretraining, significantly deteriorates. Current RL algorithms applied in drug discovery, such as REINVENT, use prior models as anchors to retian pretraining knowledge, but these methods lack robust exploration mechanisms. To address these issues, we propose Partial SMILES Validation-PPO (PSV-PPO), a novel RL algorithm that incorporates real-time partial SMILES validation to prevent catastrophic forgetting while encouraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
