Guiding Long-Horizon Task and Motion Planning with Vision Language Models
Zhutian Yang, Caelan Garrett, Dieter Fox, Tom\'as Lozano-P\'erez,, Leslie Pack Kaelbling

TL;DR
This paper introduces VLM-TAMP, a hierarchical planning approach that combines vision-language models with task and motion planning to improve robot performance in complex, multi-step kitchen tasks involving many objects.
Contribution
The paper presents a novel hierarchical planning algorithm that uses vision-language models to generate intermediate subgoals, enhancing robot planning in complex environments.
Findings
VLM-TAMP achieves success rates up to 100% in kitchen tasks.
It significantly outperforms baseline methods in task completion.
The approach effectively integrates semantic understanding with geometric feasibility.
Abstract
Vision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically and kinematically feasible for a particular robot embodiment. As a result, many prerequisite steps such as opening drawers to access objects are often omitted in their plans. Robot task and motion planners can generate motion trajectories that respect the geometric feasibility of actions and insert physically necessary actions, but do not scale to everyday problems that require common-sense knowledge and involve large state spaces comprised of many variables. We propose VLM-TAMP, a hierarchical planning algorithm that leverages a VLM to generate goth semantically-meaningful and horizon-reducing intermediate subgoals that guide a task and motion planner.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms
