Open-World Task and Motion Planning via Vision-Language Model Generated Constraints

Nishanth Kumar; William Shen; Fabio Ramos; Dieter Fox; Tom\'as Lozano-P\'erez; Leslie Pack Kaelbling; Caelan Reed Garrett

arXiv:2411.08253·cs.RO·March 12, 2026

Open-World Task and Motion Planning via Vision-Language Model Generated Constraints

Nishanth Kumar, William Shen, Fabio Ramos, Dieter Fox, Tom\'as Lozano-P\'erez, Leslie Pack Kaelbling, Caelan Reed Garrett

PDF

Open Access

TL;DR

This paper introduces OWL-TAMP, a novel system integrating vision-language models into task and motion planning to enable robots to interpret natural language goals and perform complex manipulation tasks in open-world environments.

Contribution

The paper presents a method for using VLMs to generate constraints that enhance TAMP systems, allowing for natural language understanding and open-world reasoning in robotic manipulation.

Findings

01

OWL-TAMP outperforms baselines in long-horizon tasks

02

Effective integration of VLM-generated constraints improves planning accuracy

03

Demonstrated success on real-world robotic hardware

Abstract

Foundation models like Vision-Language Models (VLMs) excel at common sense vision and language tasks such as visual question answering. However, they cannot yet directly solve complex, long-horizon robot manipulation problems requiring precise continuous reasoning. Task and Motion Planning (TAMP) systems can handle long-horizon reasoning through discrete-continuous hybrid search over parameterized skills, but rely on detailed environment models and cannot interpret novel human objectives, such as arbitrary natural language goals. We propose integrating VLMs into TAMP systems by having them generate discrete and continuous language-parameterized constraints that enable open-world reasoning. Specifically, we use VLMs to generate discrete action ordering constraints that constrain TAMP search over action sequences, and continuous constraints in the form of code that augments traditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms