Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
Vittorio Giammarino, Ahmed H. Qureshi

TL;DR
This paper introduces Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning, a novel PDE-based approach that improves goal-reaching capabilities and generalization in RL tasks, with hierarchical extensions for complex dynamics.
Contribution
It presents Eik-QRL, a PDE-based, trajectory-free method for goal-conditioned RL, and Eik-HiQRL, a hierarchical extension addressing complex dynamics with theoretical guarantees.
Findings
Achieves state-of-the-art in offline goal navigation.
Yields consistent gains in manipulation tasks.
Matches temporal-difference methods in performance.
Abstract
Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce…
Peer Reviews
Decision·ICLR 2026 Poster
1. The presented formulation is trajectory-free and only requires one to sample (state, goals) rather than complete trajectory rollouts which I really appreciate. 2. The core part of the presented Eikonal approach boils down to a constrained optimization problem which can be readily solved through a wide suite of existing physics informed ML methods. 3. The experiments in the paper specifically highlight that the formulation performs well in settings where no theoretical statements can be made c
1. The main weakness is the relatively simplistic dynamics assumption of the formulation (i.e., Lipschitzness, and that the continuous time counterpart is unit speed isotropic). However, I feel that this weakness is adequately acknowledged and addressed by the authors. 2. One of the main assumptions for Lemma 4.7 and Theorem 4.8, stated in line 301 says $c(s,g)=1$ on $\mathcal{K}\setminus g$. Does this mean $g$ can be the only suitable goal in $\mathcal{K}$? If so, I feel that is a limitation as
- The theory and formulation of Eik-QRL is novel and is an interesting perspective on QRL. - The clarity and writing in the paper is generally very good. The theory is well written, the writing is well-balanced, limitations are well-acknowledged. - While Eik-QRL does make strong assumptions, it seems plausible these assumptions could hold to some extent in certain scenarios. - Eik-QRL often shows strong performance on benchmarks versus baselines, including state-of-the art humanoid maze performa
**Assumptions of Eik-QRL may limit real-world use-cases.** The authors have well-acknowledged the limitations of some assumptions made by Eik-QRL. My general concern is that these assumptions could limit the significance and wider impact of this method. While the paper has evaluated their method on a wide range of environments, including some where their assumptions break down, many of these environments seem to be navigation-based - it seems plausible the method is overfitting to these specifi
1. The core idea of connecting quasimetric learning's local consistency to the Eikonal PDE is a creative, insightful, and theoretically sound contribution. 2. A major practical strength of Eik-QRL is that it is trajectory-free, only requiring i.i.d. state and goal samples. This makes it more data-efficient and better suited for offline learning from unstructured datasets than the original QRL, which requires transition tuples. 3. The paper clearly identifies the main weakness of its own Eik-Q
1. The paper's strongest results are in navigation tasks (pointmaze, antmaze, humanoidmaze). In the antsoccer and manipulation tasks (Table 2), the performance gains vanish, and Eik-HiQRL is only "comparable" to baselines. The paper acknowledges this but it suggests the method's applicability is currently best suited for tasks where a simple Cartesian abstract space is available. 2. The success of Eik-HiQRL appears to be highly dependent on the choice of the high-level abstract space $\overlin
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMotor Control and Adaptation · Reinforcement Learning in Robotics · Robot Manipulation and Learning
