Milestone-Guided Policy Learning for Long-Horizon Language Agents

Zixuan Wang; Yuchen Yan; Hongxing Li; Teng Pan; Dingming Li; Ruiqing Zhang; Weiming Lu; Jun Xiao; Yueting Zhuang; Yongliang Shen

arXiv:2605.06078·cs.CL·May 8, 2026

Milestone-Guided Policy Learning for Long-Horizon Language Agents

Zixuan Wang, Yuchen Yan, Hongxing Li, Teng Pan, Dingming Li, Ruiqing Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

PDF

1 Repo

TL;DR

This paper introduces BEACON, a milestone-guided policy learning framework that improves training of long-horizon language agents by addressing credit misattribution and sample inefficiency, leading to significant performance gains.

Contribution

BEACON leverages task milestones for precise credit assignment, enhancing learning efficiency and success rates in long-horizon language agent tasks.

Findings

01

BEACON achieves 92.9% success on ALFWorld long-horizon tasks.

02

It nearly doubles the success rate compared to previous methods.

03

Sample utilization improves from 23.7% to 82.0% with BEACON.

Abstract

While long-horizon agentic tasks require language agents to perform dozens of sequential decisions, training such agents with reinforcement learning remains challenging. We identify two root causes: credit misattribution, where correct early actions are penalized due to terminal failures, and sample inefficiency, where scarce successful trajectories result in near-total loss of learning signal. We introduce a milestone-guided policy learning framework, BEACON, that leverages the compositional structure of long-horizon tasks to ensure precise credit assignment. BEACON partitions trajectories at milestone boundaries, applies temporal reward shaping within segments to credit partial progress, and estimates advantages at dual scales to prevent distant failures from corrupting the evaluation of local actions. On ALFWorld, WebShop, and ScienceWorld, BEACON consistently outperforms GRPO and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZJU-REAL/BEACON
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.