Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
Karl Pertsch, Oleh Rybkin, Frederik Ebert, Chelsea Finn, Dinesh, Jayaraman, Sergey Levine

TL;DR
This paper introduces goal-conditioned hierarchical predictors (GCPs) for long-horizon visual planning, enabling agents to predict and plan future trajectories effectively by combining goal-awareness with a divide-and-conquer hierarchical approach.
Contribution
The work presents a novel framework that integrates goal-conditioning with hierarchical prediction, significantly improving long-term visual planning capabilities.
Findings
GCPs improve planning efficiency by constraining trajectory search space.
Hierarchical models enable effective long-term prediction through recursive subdivision.
The approach allows solving visual planning tasks with much longer horizons than previous methods.
Abstract
The ability to predict and plan into the future is fundamental for agents acting in the world. To reach a faraway goal, we predict trajectories at multiple timescales, first devising a coarse plan towards the goal and then gradually filling in details. In contrast, current learning approaches for visual prediction and planning fail on long-horizon tasks as they generate predictions (1) without considering goal information, and (2) at the finest temporal resolution, one step at a time. In this work we propose a framework for visual prediction and planning that is able to overcome both of these limitations. First, we formulate the problem of predicting towards a goal and propose the corresponding class of latent space goal-conditioned predictors (GCPs). GCPs significantly improve planning efficiency by constraining the search space to only those trajectories that reach the goal. Further,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
