AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search

Zefang Zong; Dingwei Chen; Yang Li; Qi Yi; Bo Zhou; Chengming Li; Bo Qian; Peng Chen; Jie Jiang

arXiv:2601.04767·cs.AI·January 9, 2026

AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search

Zefang Zong, Dingwei Chen, Yang Li, Qi Yi, Bo Zhou, Chengming Li, Bo Qian, Peng Chen, Jie Jiang

PDF

Open Access

TL;DR

This paper introduces AT$^2$PO, a novel framework for multi-turn agentic reinforcement learning that combines tree search with turn-level policy optimization to improve exploration, credit assignment, and policy alignment.

Contribution

AT$^2$PO presents a unified turn-level tree structure and learning objective, enhancing multi-turn RL by addressing exploration, credit assignment, and policy optimization challenges.

Findings

01

Achieves up to 1.84% improvement over state-of-the-art baselines.

02

Validates effectiveness through extensive ablation studies.

03

Demonstrates versatility by integrating with various RL pipelines.

Abstract

LLM agents have emerged as powerful systems for tackling multi-turn tasks by interleaving internal reasoning and external tool interactions. Agentic Reinforcement Learning has recently drawn significant research attention as a critical post-training paradigm to further refine these capabilities. In this paper, we present AT $^{2}$ PO (Agentic Turn-based Policy Optimization via Tree Search), a unified framework for multi-turn agentic RL that addresses three core challenges: limited exploration diversity, sparse credit assignment, and misaligned policy optimization. AT $^{2}$ PO introduces a turn-level tree structure that jointly enables Entropy-Guided Tree Expansion for strategic exploration and Turn-wise Credit Assignment for fine-grained reward propagation from sparse outcomes. Complementing this, we propose Agentic Turn-based Policy Optimization, a turn-level learning objective that aligns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Artificial Intelligence in Games