Open-World Task and Motion Planning via Vision-Language Model Generated Constraints
Nishanth Kumar, William Shen, Fabio Ramos, Dieter Fox, Tom\'as Lozano-P\'erez, Leslie Pack Kaelbling, Caelan Reed Garrett

TL;DR
This paper introduces OWL-TAMP, a novel system integrating vision-language models into task and motion planning to enable robots to interpret natural language goals and perform complex manipulation tasks in open-world environments.
Contribution
The paper presents a method for using VLMs to generate constraints that enhance TAMP systems, allowing for natural language understanding and open-world reasoning in robotic manipulation.
Findings
OWL-TAMP outperforms baselines in long-horizon tasks
Effective integration of VLM-generated constraints improves planning accuracy
Demonstrated success on real-world robotic hardware
Abstract
Foundation models like Vision-Language Models (VLMs) excel at common sense vision and language tasks such as visual question answering. However, they cannot yet directly solve complex, long-horizon robot manipulation problems requiring precise continuous reasoning. Task and Motion Planning (TAMP) systems can handle long-horizon reasoning through discrete-continuous hybrid search over parameterized skills, but rely on detailed environment models and cannot interpret novel human objectives, such as arbitrary natural language goals. We propose integrating VLMs into TAMP systems by having them generate discrete and continuous language-parameterized constraints that enable open-world reasoning. Specifically, we use VLMs to generate discrete action ordering constraints that constrain TAMP search over action sequences, and continuous constraints in the form of code that augments traditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms
