HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

Jiangweizhi Peng; Yuanxin Liu; Ruida Zhou; Charles Fleming; Zhaoran Wang; Alfredo Garcia; Mingyi Hong

arXiv:2602.16165·cs.LG·May 12, 2026

HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

Jiangweizhi Peng, Yuanxin Liu, Ruida Zhou, Charles Fleming, Zhaoran Wang, Alfredo Garcia, Mingyi Hong

PDF

TL;DR

HiPER introduces a hierarchical reinforcement learning framework for large language model agents, explicitly separating planning and execution, leading to improved performance on complex, long-horizon decision-making tasks.

Contribution

The paper proposes HiPER, a hierarchical RL approach with explicit credit assignment and a novel advantage estimation technique, enhancing training stability and efficiency for LLM agents.

Findings

01

Achieves state-of-the-art success rates on ALFWorld and WebShop benchmarks.

02

Outperforms prior methods by 6.6% and 8.3% respectively.

03

Significantly improves performance on long-horizon tasks.

Abstract

Training LLMs as interactive agents for multi-turn decision-making remains challenging, particularly in long-horizon tasks with sparse and delayed rewards, where agents must execute extended sequences of actions before receiving meaningful feedback. Most existing reinforcement learning (RL) approaches model LLM agents as flat policies operating at a single time scale, selecting one action at each turn. In sparse-reward settings, such flat policies must propagate credit across the entire trajectory without explicit temporal abstraction, which often leads to unstable optimization and inefficient credit assignment. We propose HiPER, a novel Hierarchical Plan-Execute RL framework that explicitly separates high-level planning from low-level execution. HiPER factorizes the policy into a high-level planner that proposes subgoals and a low-level executor that carries them out over multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.