Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

Vittorio Giammarino; Ahmed H. Qureshi

arXiv:2512.12046·cs.LG·March 3, 2026

Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

Vittorio Giammarino, Ahmed H. Qureshi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning, a novel PDE-based approach that improves goal-reaching capabilities and generalization in RL tasks, with hierarchical extensions for complex dynamics.

Contribution

It presents Eik-QRL, a PDE-based, trajectory-free method for goal-conditioned RL, and Eik-HiQRL, a hierarchical extension addressing complex dynamics with theoretical guarantees.

Findings

01

Achieves state-of-the-art in offline goal navigation.

02

Yields consistent gains in manipulation tasks.

03

Matches temporal-difference methods in performance.

Abstract

Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1. The presented formulation is trajectory-free and only requires one to sample (state, goals) rather than complete trajectory rollouts which I really appreciate. 2. The core part of the presented Eikonal approach boils down to a constrained optimization problem which can be readily solved through a wide suite of existing physics informed ML methods. 3. The experiments in the paper specifically highlight that the formulation performs well in settings where no theoretical statements can be made c

Weaknesses

1. The main weakness is the relatively simplistic dynamics assumption of the formulation (i.e., Lipschitzness, and that the continuous time counterpart is unit speed isotropic). However, I feel that this weakness is adequately acknowledged and addressed by the authors. 2. One of the main assumptions for Lemma 4.7 and Theorem 4.8, stated in line 301 says $c(s,g)=1$ on $\mathcal{K}\setminus g$. Does this mean $g$ can be the only suitable goal in $\mathcal{K}$? If so, I feel that is a limitation as

Reviewer 02Rating 6Confidence 2

Strengths

- The theory and formulation of Eik-QRL is novel and is an interesting perspective on QRL. - The clarity and writing in the paper is generally very good. The theory is well written, the writing is well-balanced, limitations are well-acknowledged. - While Eik-QRL does make strong assumptions, it seems plausible these assumptions could hold to some extent in certain scenarios. - Eik-QRL often shows strong performance on benchmarks versus baselines, including state-of-the art humanoid maze performa

Weaknesses

**Assumptions of Eik-QRL may limit real-world use-cases.** The authors have well-acknowledged the limitations of some assumptions made by Eik-QRL. My general concern is that these assumptions could limit the significance and wider impact of this method. While the paper has evaluated their method on a wide range of environments, including some where their assumptions break down, many of these environments seem to be navigation-based - it seems plausible the method is overfitting to these specifi

Reviewer 03Rating 4Confidence 2

Strengths

1. The core idea of connecting quasimetric learning's local consistency to the Eikonal PDE is a creative, insightful, and theoretically sound contribution. 2. A major practical strength of Eik-QRL is that it is trajectory-free, only requiring i.i.d. state and goal samples. This makes it more data-efficient and better suited for offline learning from unstructured datasets than the original QRL, which requires transition tuples. 3. The paper clearly identifies the main weakness of its own Eik-Q

Weaknesses

1. The paper's strongest results are in navigation tasks (pointmaze, antmaze, humanoidmaze). In the antsoccer and manipulation tasks (Table 2), the performance gains vanish, and Eik-HiQRL is only "comparable" to baselines. The paper acknowledges this but it suggests the method's applicability is currently best suited for tasks where a simple Cartesian abstract space is available. 2. The success of Eik-HiQRL appears to be highly dependent on the choice of the high-level abstract space $\overlin

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMotor Control and Adaptation · Reinforcement Learning in Robotics · Robot Manipulation and Learning