TL;DR
StraTA introduces a trajectory-level strategy framework for reinforcement learning, significantly enhancing exploration, credit assignment, and performance in long-horizon decision-making tasks involving large language models.
Contribution
The paper proposes a novel hierarchical framework, StraTA, that explicitly incorporates trajectory strategies into RL, improving sample efficiency and success rates in complex environments.
Findings
StraTA achieves 93.1% success on ALFWorld.
StraTA attains 84.2% success on WebShop.
On SciWorld, StraTA scores 63.5%, outperforming other models.
Abstract
Large language models (LLMs) are increasingly used as interactive agents, but optimizing them for long-horizon decision making remains difficult because current methods are largely purely reactive, which weakens both exploration and credit assignment over extended trajectories. In this work, we present Strategic Trajectory Abstraction (StraTA), a simple framework that introduces an explicit trajectory-level strategy into agentic reinforcement learning (RL). StraTA samples a compact strategy from the initial task state, conditions subsequent actions on that strategy, and trains strategy generation and action execution jointly with a hierarchical GRPO-style rollout design, further enhanced by diverse strategy rollout and critical self-judgment. Experiments on ALFWorld, WebShop, and SciWorld show that StraTA consistently improves both sample efficiency and final performance over strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
