TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization
Peiji Li, Linyang Li, Handa Sun, Wenjin Mai, Yongkang Chen, Xiaozhe Li, Yue Shen, Yichuan Ma, Yiliu Sun, Jiaxi Cao, Zhishu He, Bo Wang, Xiaoqing Zheng, Zhaori Bi, Xipeng Qiu, Qipeng Guo, Kai Chen, Dahua Lin

TL;DR
TL-GRPO is a novel turn-level reinforcement learning algorithm designed for iterative reasoning tasks, outperforming existing methods in complex scientific optimization like analog circuit sizing.
Contribution
We introduce TL-GRPO, a lightweight turn-level RL method that enables fine-grained optimization in reasoning tasks with shared environment states, addressing limitations of previous trajectory-level approaches.
Findings
TL-GRPO outperforms standard GRPO and Bayesian optimization in analog circuit sizing.
A 30B model trained with TL-GRPO achieves state-of-the-art results.
TL-GRPO demonstrates strong generalization and practical utility in scientific optimization.
Abstract
Large language models have demonstrated strong reasoning capabilities in complex tasks through tool integration, which is typically framed as a Markov Decision Process and optimized with trajectory-level RL algorithms such as GRPO. However, a common class of reasoning tasks, iterative optimization, presents distinct challenges: the agent interacts with the same underlying environment state across turns, and the value of a trajectory is determined by the best turn-level reward rather than cumulative returns. Existing GRPO-based methods cannot perform fine-grained, turn-level optimization in such settings, while black-box optimization methods discard prior knowledge and reasoning capabilities. To address this gap, we propose Turn-Level GRPO (TL-GRPO), a lightweight RL algorithm that performs turn-level group sampling for fine-grained optimization. We evaluate TL-GRPO on analog circuit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Multimodal Machine Learning Applications
