Retrosynthetic Planning with Dual Value Networks
Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof, Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, Tie-Yan Liu

TL;DR
This paper introduces PDVN, a reinforcement learning-based method that improves retrosynthetic planning by optimizing complete routes with dual value networks, significantly increasing success rates and reducing route lengths.
Contribution
The paper presents a novel online training algorithm, PDVN, that uses dual value networks to optimize retrosynthetic routes, integrating route-level rewards with single-step prediction accuracy.
Findings
Increases success rate from 85.79% to 98.95% for Retro*.
Reduces average route length from 5.76 to 4.83 for Retro*.
Halves the number of model calls needed for RetroGraph.
Abstract
Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step accuracy, without considering complete routes. Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes. Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase. In PDVN, we construct two separate value networks to predict the synthesizability and cost of molecules, respectively. To maintain the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAI-based Problem Solving and Planning
