GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

Kyle B. Hatch; Ashwin Balakrishna; Oier Mees; Suraj Nair; Seohong; Park; Blake Wulfe; Masha Itkina; Benjamin Eysenbach; Sergey Levine; Thomas; Kollar; and Benjamin Burchfiel

arXiv:2410.20018·cs.RO·October 29, 2024

GHIL-Glue: Hierarchical Control with Filtered Subgoal Images

Kyle B. Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong, Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas, Kollar, and Benjamin Burchfiel

PDF

Open Access

TL;DR

GHIL-Glue enhances hierarchical robot control by filtering and refining generative subgoals, significantly improving robustness and generalization in both simulated and real environments, and setting new benchmarks in language-conditioned manipulation tasks.

Contribution

Introduces GHIL-Glue, a novel interface that filters and improves generative subgoals, enhancing the integration of image/video prediction models with low-level policies.

Findings

01

25% performance improvement on CALVIN benchmark

02

Outperforms existing policies in zero-shot manipulation tasks

03

Achieves state-of-the-art results with RGB camera observations

Abstract

Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for low-level goal-conditioned policies to reach. However, the performance of these systems can be greatly bottlenecked by the interface between generative models and low-level controllers. For example, generative models may predict photorealistic yet physically infeasible frames that confuse low-level policies. Low-level policies may also be sensitive to subtle visual artifacts in generated goal images. This paper addresses these two facets of generalization, providing an interface to effectively "glue together" language-conditioned image or video prediction models with low-level goal-conditioned policies. Our method, Generative Hierarchical Imitation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging