Reinforced Reasoning for End-to-End Retrosynthetic Planning
Chenyang Zuo, Siqi Fan, Yizhen Luo, Zaiqing Nie

TL;DR
ReTriP is an end-to-end generative framework that reformulates retrosynthetic planning as a Chain-of-Thought reasoning task, improving global planning coherence and robustness.
Contribution
It introduces ReTriP, a novel approach integrating reasoning and reinforcement learning for more coherent and effective retrosynthetic planning.
Findings
ReTriP achieves state-of-the-art performance on RetroBench.
ReTriP demonstrates superior robustness in long-horizon planning.
The framework effectively aligns stepwise generation with route utility.
Abstract
Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresight directly into the model's chemical reasoning, we introduce ReTriP, an end-to-end generative framework that reformulates retrosynthesis as a direct Chain-of-Thought reasoning task. We establish a path-coherent molecular representation and employ a progressive training curriculum that transitions from reasoning distillation to reinforcement learning with verifiable rewards, effectively aligning stepwise generation with practical route…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
