Learning Agentic Policy from Action Guidance

Yuxiang Ji; Zengbin Wang; Yong Wang; Shidong Yang; Ziyu Ma; Guanhua Chen; Zonghua Sun; Liaoni Wu; Xiangxiang Chu

arXiv:2605.12004·cs.CL·May 13, 2026

Learning Agentic Policy from Action Guidance

Yuxiang Ji, Zengbin Wang, Yong Wang, Shidong Yang, Ziyu Ma, Guanhua Chen, Zonghua Sun, Liaoni Wu, Xiangxiang Chu

PDF

1 Repo

TL;DR

This paper introduces ActGuide-RL, a method that uses human action data as guidance to improve agentic reinforcement learning in large language models, reducing reliance on costly supervised fine-tuning.

Contribution

It proposes a novel approach that injects action data as plan-style guidance, enabling better exploration and learning in reward-sparse tasks without extensive fine-tuning.

Findings

01

ActGuide-RL significantly outperforms zero RL on search-agent benchmarks.

02

It matches the performance of SFT+RL pipelines without cold start.

03

The method effectively internalizes exploration gains through mixed-policy training.

Abstract

Agentic reinforcement learning (RL) for Large Language Models (LLMs) critically depends on the exploration capability of the base policy, as training signals emerge only within its in-capability region. For tasks where the base policy cannot reach reward states, additional training or external guidance is needed to recover effective learning signals. Rather than relying on costly iterative supervised fine tuning (SFT), we exploit the abundant action data generated in everyday human interactions. We propose \textsc{ActGuide-RL}, which injects action data as plan-style reference guidance, enabling the agentic policy to overcome reachability barriers to reward states. Guided and unguided rollouts are then jointly optimized via mixed-policy training, internalizing the exploration gains back into the unguided policy. Motivated by a theoretical and empirical analysis of the benefit-risk…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amap-ml/ActGuide-RL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.