Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling
Minseo Kwon, Young J. Kim

TL;DR
This paper introduces a kinodynamic TAMP framework that combines symbolic and numeric states, guided by visual language models and physics simulation, significantly improving success rates and efficiency in complex planning tasks.
Contribution
It presents a novel kinodynamic TAMP approach using a hybrid state tree and VLM-guided exploration, integrating visual reasoning with motion planning.
Findings
Achieved 32.14% to 1166.67% higher success rates
Reduced planning time on complex problems
VLM backtracking enhances search efficiency
Abstract
Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic TAMP planner based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% - 1166.67% increased average success rates compared to traditional and LLM-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
