Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Jaebak Hwang; Sanghyeon Lee; Jeongmo Kim; Seungyul Han

arXiv:2506.21039·cs.LG·May 21, 2026

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

PDF

1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces Strict Subgoal Execution (SSE), a hierarchical RL framework that improves long-horizon goal-conditioned task performance by reliably identifying feasible subgoals and refining paths using a novel experience replay method.

Contribution

The paper proposes SSE, a graph-based hierarchical RL method that enhances subgoal reliability and planning efficiency through Frontier Experience Replay and path refinement techniques.

Findings

01

SSE outperforms existing methods in success rate across benchmarks.

02

FER effectively separates reachable and unreachable subgoals.

03

Decoupled exploration improves goal space coverage.

Abstract

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, their reliance on conventional hindsight relabeling often fails to correct subgoal infeasibility, leading to inefficient high-level planning. To address this, we propose Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that integrates Frontier Experience Replay (FER) to separate unreachable from admissible subgoals and streamline high-level decision making. FER delineates the reachability frontier using failure and partial-success transitions, which identifies unreliable subgoals, increases subgoal reliability, and reduces unnecessary high-level decisions. Additionally, SSE employs a decoupled exploration policy to cover underexplored regions…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

Strong empirical results on a diverse suite, including tasks that require implicit sequencing Ablation coverage is thoughtful: removing FER or replacing with HER largely breaks performance on harder tasks

Weaknesses

The density estimator and failure statistics hinge on a grid. This is fine for 2D/3D but will be problematic in higher dimensions and for goals that include orientation or other factors. No guarantees or formal properties regarding convergence or bias introduced by early termination/FER.

Reviewer 02Rating 4Confidence 2

Strengths

• The paper addresses an important problem in hierarchical reinforcement learning — unreliable subgoal execution — which is crucial for long-horizon tasks. • The introduction of Frontier Experience Replay (FER) is conceptually clear and provides a principled way to delineate reachable and unreachable subgoals, improving training stability.

Weaknesses

• The proposed framework assumes that the goal space is known and low-dimensional, which may not hold for complex real-world manipulation or visual tasks where the goal representation itself is high-dimensional and uncertain. • The technical novelty is moderate — SSE combines known components (graph-based HRL, experience replay, path cost reweighting) rather than introducing fundamentally new learning principles. • All experiments are conducted in simulators; there is no validation in real-

Reviewer 03Rating 6Confidence 3

Strengths

1. The manuscript is clearly written, with figures that effectively elucidate the proposed methodology and equations that comprehensively convey its technical details. 2. The experimental design is well structured, incorporating appropriate selections of baselines. The ablation study provides a thorough examination of the contribution of each component and offers a detailed analysis of the method’s sensitivity to hyperparameters.

Weaknesses

1. To make a stronger case for the method's generality, the paper should include results from a broader set of environments. Specifically, in line with the existing HRL works, the authors may want to evaluate the method on Pusher, AntFall, AntGather and Ant4Rooms, in addition to the relatively simpler maze-based tasks. This would provide a better understanding of how well SSE adapts to various state spaces and task complexities. Including additional tasks would also help demonstrate whether the

Code & Models

Repositories

https://jaebak1996.github.io/SSE
github

Videos

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics