REINFORCE-ING Chemical Language Models for Drug Discovery
Morgan Thomas, Albert Bou, Jose Carlos G\'omez-Tamayo, Gary Tresadern, Mazen Ahmad, Gianni De Fabritiis

TL;DR
This paper investigates the application of reinforcement learning, specifically REINFORCE, to chemical language models for drug discovery, proposing new methods and best practices to improve efficiency and effectiveness.
Contribution
It introduces a new regularization method aligned with REINFORCE, explores hyperparameter tuning, and demonstrates improved drug discovery performance using RL in chemical language models.
Findings
Enhanced learning efficiency on binding affinity models
Proposed regularization method improves RL training stability
RL hyperparameter tuning boosts drug discovery effectiveness
Abstract
Chemical language models, combined with reinforcement learning (RL), have shown significant promise to efficiently traverse large chemical spaces for drug discovery. However, the performance of various RL algorithms and their best practices for practical drug discovery are still unclear. Here, starting from the principles of the REINFORCE algorithm, we investigate the effect of different components from RL theory including experience replay, hill-climbing, baselines to reduce variance, and alternative reward shaping. We propose a new regularization method more aligned to REINFORCE than current standard practices, and demonstrate how RL hyperparameters can be fine-tuned for effectiveness and efficiency. Lastly, we apply our learnings to practical drug discovery by demonstrating enhanced learning efficiency on frontier binding affinity models by using Boltz2 as a reward model. We share…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods
MethodsREINFORCE
