TL;DR
This paper introduces STEP-HRL, a hierarchical reinforcement learning framework for LLM agents that improves efficiency and scalability by focusing on single-step transitions and local progress summaries.
Contribution
It presents a novel HRL approach that conditions on step-level transitions and uses local progress modules, enhancing performance and reducing token usage.
Findings
Outperforms baselines on ScienceWorld and ALFWorld benchmarks.
Reduces token usage while maintaining high performance.
Improves generalization in hierarchical RL for LLM agents.
Abstract
Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose STEP-HRL, a hierarchical reinforcement learning (HRL) framework that enables step-level learning by conditioning only on single-step transitions rather than full interaction histories. STEP-HRL structures tasks hierarchically, using completed subtasks to represent global progress of overall task. By introducing a local progress module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress. Together, these components yield augmented step-level transitions for both high-level and low-level policies. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
