Reinforced Reasoning for End-to-End Retrosynthetic Planning

Chenyang Zuo; Siqi Fan; Yizhen Luo; Zaiqing Nie

arXiv:2603.29723·cs.AI·April 1, 2026

Reinforced Reasoning for End-to-End Retrosynthetic Planning

Chenyang Zuo, Siqi Fan, Yizhen Luo, Zaiqing Nie

PDF

TL;DR

ReTriP is an end-to-end generative framework that reformulates retrosynthetic planning as a Chain-of-Thought reasoning task, improving global planning coherence and robustness.

Contribution

It introduces ReTriP, a novel approach integrating reasoning and reinforcement learning for more coherent and effective retrosynthetic planning.

Findings

01

ReTriP achieves state-of-the-art performance on RetroBench.

02

ReTriP demonstrates superior robustness in long-horizon planning.

03

The framework effectively aligns stepwise generation with route utility.

Abstract

Retrosynthetic planning is a fundamental task in organic chemistry, yet remains challenging due to its combinatorial complexity. To address this, conventional approaches typically rely on hybrid frameworks that combine single-step predictions with external search heuristics, inevitably fracturing the logical coherence between local molecular transformations and global planning objectives. To bridge this gap and embed sophisticated strategic foresight directly into the model's chemical reasoning, we introduce ReTriP, an end-to-end generative framework that reformulates retrosynthesis as a direct Chain-of-Thought reasoning task. We establish a path-coherent molecular representation and employ a progressive training curriculum that transitions from reasoning distillation to reinforcement learning with verifiable rewards, effectively aligning stepwise generation with practical route…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.