Using VLM Reasoning to Constrain Task and Motion Planning

Muyang Yan; Miras Mengdibayev; Ardon Floros; Weihang Guo; Lydia E. Kavraki; Zachary Kingston

arXiv:2510.25548·cs.RO·March 17, 2026

Using VLM Reasoning to Constrain Task and Motion Planning

Muyang Yan, Miras Mengdibayev, Ardon Floros, Weihang Guo, Lydia E. Kavraki, Zachary Kingston

PDF

TL;DR

This paper introduces VIZ-COAST, a novel approach that uses large pretrained Vision-Language Models to identify potential refinement issues in task and motion planning beforehand, significantly improving planning efficiency and reliability.

Contribution

The paper presents VIZ-COAST, a method that leverages vision-language models to predict refinement feasibility, reducing the need for costly replanning in task and motion planning.

Findings

01

Drastically reduces planning times in TAMP domains

02

Eliminates downward refinement failures in some cases

03

Generalizes across diverse domain instances

Abstract

In task and motion planning, high-level task planning is done over an abstraction of the world to enable efficient search in long-horizon robotics problems. However, the feasibility of these task-level plans relies on the downward refinability of the abstraction into continuous motion. When a domain's refinability is poor, task-level plans that appear valid may ultimately fail during motion planning, requiring replanning and resulting in slower overall performance. Prior works mitigate this by encoding refinement issues as constraints to prune infeasible task plans. However, these approaches only add constraints upon refinement failure, expending significant search effort on infeasible branches. We propose VIZ-COAST, a method of leveraging the common-sense spatial reasoning of large pretrained Vision-Language Models to identify issues with downward refinement a priori, bypassing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.