A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

Dingwei Chen; Zefang Zong; Zhipeng Ma; Leo Luo; Yang Li; Chengming Li; Peng Chen; Jie Jiang

arXiv:2605.06200·cs.CL·May 8, 2026

A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

Dingwei Chen, Zefang Zong, Zhipeng Ma, Leo Luo, Yang Li, Chengming Li, Peng Chen, Jie Jiang

PDF

1 Repo

TL;DR

This paper introduces A$^2$TGPO, a novel reinforcement learning method for large language models that adaptively normalizes, accumulates, and clips turn-level signals to improve multi-turn interaction training.

Contribution

It proposes a new approach to leverage intrinsic information gain signals with adaptive normalization, accumulation, and clipping, addressing systematic challenges in RL training of agentic LLMs.

Findings

01

Normalized IG within turn groups improves stability.

02

Variance-rescaled accumulation maintains consistent advantage magnitudes.

03

Adaptive clipping enhances policy updates based on turn informativeness.

Abstract

Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such process credit assignment either depend on separate external process reward models that introduce additional consumption, or tree-based structural rollout that merely redistributes the outcome signal while constraining trajectory diversity. A promising alternative leverages the per-turn change in the policy's predicted probability of the ground-truth, termed Information Gain (IG), as an intrinsic process signal without an external evaluator. However, prior work on leveraging IG signals within the RL training loop faces three systematic challenges: normalizing across turns that face heterogeneous positional contexts can distort the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cuso4-chen/A-TGPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.