Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

Roberto Cipollone; Luca Iocchi; Matteo Leonetti

arXiv:2512.04958·cs.LG·December 5, 2025

Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

Roberto Cipollone, Luca Iocchi, Matteo Leonetti

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Realizable Abstractions for hierarchical reinforcement learning, providing a formal framework with near-optimal guarantees, and proposes RARL, an algorithm that efficiently learns near-optimal policies using these abstractions.

Contribution

It defines a new formal notion of Realizable Abstractions that overcomes previous limitations, and develops RARL, a practical algorithm leveraging these abstractions for efficient learning.

Findings

01

RARL converges in polynomial time.

02

It is robust to abstraction inaccuracies.

03

Provides near-optimal policies via compositional options.

Abstract

The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combining partial solutions computed for smaller subtasks. Despite their very intuitive role for learning, most notions of MDP abstractions proposed in the HRL literature have limited expressive power or do not possess formal efficiency guarantees. This work addresses these fundamental issues by defining Realizable Abstractions, a new relation between generic low-level MDPs and their associated high-level decision processes. The notion we propose avoids non-Markovianity issues and has desirable near-optimality guarantees. Indeed, we show that any abstract policy for Realizable Abstractions can be translated into near-optimal policies for the low-level MDP, through a suitable composition of options. As…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 8Confidence 4

Strengths

Very clear, very well written.

Weaknesses

I appreciated the example illustrated in figure 1. I would suggest the authors refer to it when introducing new concepts and when discussing RARL.

Reviewer 02Rating 6Confidence 2

Strengths

The paper addresses an important problem in hierarchical RL dealing with abstractions which relate high-level and low-level representations. Connecting hierarchical abstractions to constrained MDPs such that options can be extracted by solving the CMDPs with off-the-shelf algorithms is interesting and, to my knowledge, novel. The paper is well-written and the intuitive explanations for the theory are fairly easy to follow, though it is extremely notation-heavy.

Weaknesses

While an algorithm is proposed (RARL), there are no empirical results to support it and validate the assumptions that are made for the guarantees in Section 4. I would like to see RARL compared with existing methods, e.g., some form of option-critic (with specified options) or deep skill chaining (for a skill discovery comparison). Additionally, as it is not clear to me how reasonable Assumptions 1-3 are, the paper would be strengthened by experiments showing how RARL is affected by violations o

Reviewer 03Rating 3Confidence 3

Strengths

Realizable Abstractions offer a fresh theoretical foundation for HRL, opening up a promising way for potentially reducing sample complexity in reinforcement learning. I find it particularly intriguing that sparse rewards play a crucial role in ensuring admissibility (Proposition 6).

Weaknesses

While the new concept of Realizable Abstractions is interesting, the paper lacks some key definitions, making it challenging to follow. As a result, it’s difficult to fully grasp the significance of the paper’s main contributions. The followings are main weaknesses of this paper: - The proposed algorithm requires a **known** abstraction and assumes admissibility, which feels like a rather strong assumption. However, there is not enough rigorous discussion regarding these assumptions and their

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference