Compositional Foundation Models for Hierarchical Planning

Anurag Ajay; Seungwook Han; Yilun Du; Shuang Li; Abhi Gupta; Tommi; Jaakkola; Josh Tenenbaum; Leslie Kaelbling; Akash Srivastava; Pulkit Agrawal

arXiv:2309.08587·cs.LG·September 22, 2023·2 cites

Compositional Foundation Models for Hierarchical Planning

Anurag Ajay, Seungwook Han, Yilun Du, Shuang Li, Abhi Gupta, Tommi, Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, Pulkit Agrawal

PDF

Open Access

TL;DR

This paper introduces HiP, a hierarchical planning framework that combines language, vision, and action models to improve decision-making in complex, long-horizon tasks through symbolic planning, visual reasoning, and visual-motor control.

Contribution

The paper presents a novel compositional foundation model that integrates multiple expert models for hierarchical planning in long-horizon tasks, enabling effective reasoning and execution.

Findings

01

Successful application to three long-horizon table-top manipulation tasks

02

Effective grounding of symbolic plans in visual and motor control

03

Enhanced hierarchical reasoning through iterative model refinement

Abstract

To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarchical Planning (HiP), a foundation model which leverages multiple expert foundation model trained on language, vision and action data individually jointly together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos. To enable effective reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Human Pose and Action Recognition

MethodsDiffusion