Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling

Minseo Kwon; Young J. Kim

arXiv:2510.26139·cs.RO·March 6, 2026

Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling

Minseo Kwon, Young J. Kim

PDF

TL;DR

This paper introduces a kinodynamic TAMP framework that combines symbolic and numeric states, guided by visual language models and physics simulation, significantly improving success rates and efficiency in complex planning tasks.

Contribution

It presents a novel kinodynamic TAMP approach using a hybrid state tree and VLM-guided exploration, integrating visual reasoning with motion planning.

Findings

01

Achieved 32.14% to 1166.67% higher success rates

02

Reduced planning time on complex problems

03

VLM backtracking enhances search efficiency

Abstract

Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic TAMP planner based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% - 1166.67% increased average success rates compared to traditional and LLM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.