Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

Tao Li; Kaiyuan Hou; Tuan Vinh; Monika Raj; Zhichun Guo; Carl Yang

arXiv:2604.07669·cs.LG·May 4, 2026

Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

Tao Li, Kaiyuan Hou, Tuan Vinh, Monika Raj, Zhichun Guo, Carl Yang

PDF

TL;DR

MolReAct introduces a reinforcement learning framework guided by large language models that optimizes drug lead molecules within a chemically valid, synthesis-constrained action space, achieving high property scores efficiently.

Contribution

This work presents MolReAct, a novel method combining LLM-guided action spaces with reaction templates and policy optimization for synthesizable lead optimization.

Findings

01

Achieved the highest average Top-10 score of 0.571 across multiple tasks.

02

Reduced optimization time by approximately 43% using SMILES caching.

03

Outperformed all baselines on 13 out of 14 property optimization tasks.

Abstract

Lead optimization in drug discovery requires improving therapeutic properties while ensuring that molecular modifications correspond to feasible synthetic routes. Existing approaches either prioritize property scores without enforcing synthesizability, or rely on expensive enumeration over large reaction networks, while direct application of Large Language Models (LLMs) to molecular generation frequently produces chemically invalid structures. We introduce MolReAct, a framework that formulates lead optimization as a Markov Decision Process over a synthesis-constrained action space defined by validated reaction templates. A tool-augmented LLM agent serves as a dynamic reaction environment, invoking specialized chemical analysis tools to identify reactive sites and functional groups and proposing a compact set of chemically grounded transformations from matched templates. A dedicated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.