Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning
Anthony Kobanda, Waris Radji, Mathieu Petitbois, Odalric-Ambrym Maillard, R\'emy Portelas

TL;DR
This paper introduces ProQ, a geometric framework for offline goal-conditioned reinforcement learning that learns an asymmetric distance to improve long-horizon goal reaching by guiding agents through meaningful sub-goals.
Contribution
ProQ combines metric learning, keypoint coverage, and goal-conditioned control into a unified approach, addressing long-horizon challenges in offline RL.
Findings
Effective in diverse navigation benchmarks
Produces meaningful sub-goals for long-horizon tasks
Robustly drives goal-reaching in complex environments
Abstract
Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals from previously collected trajectories. Scaling that promises to long-horizon tasks remains challenging, notably due to compounding value-estimation errors. Principled geometric offers a potential solution to address these issues. Following this insight, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns an asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over the learned latent space, and secondly as a structured directional cost guiding towards proximal sub-goals. In particular, ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the learned keypoints stay within reachable areas. By unifying metric learning, keypoint coverage, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
