VLM-TDP: VLM-guided Trajectory-conditioned Diffusion Policy for Robust Long-Horizon Manipulation
Kefeng Huang, Tingguang Li, Yuzhen Liu, Zhe Zhang, Jiankun Wang, Lei Han

TL;DR
This paper introduces VLM-TDP, a novel diffusion policy guided by vision-language models that decomposes complex long-horizon robotic tasks into manageable sub-tasks, significantly improving robustness and success rates especially under noisy conditions.
Contribution
The paper presents a new VLM-guided trajectory-conditioned diffusion policy that enhances long-horizon manipulation performance and robustness against image noise, outperforming classical diffusion methods.
Findings
44% increase in success rate
Over 100% improvement in long-horizon tasks
20% reduction in performance degradation under noise
Abstract
Diffusion policy has demonstrated promising performance in the field of robotic manipulation. However, its effectiveness has been primarily limited in short-horizon tasks, and its performance significantly degrades in the presence of image noise. To address these limitations, we propose a VLM-guided trajectory-conditioned diffusion policy (VLM-TDP) for robust and long-horizon manipulation. Specifically, the proposed method leverages state-of-the-art vision-language models (VLMs) to decompose long-horizon tasks into concise, manageable sub-tasks, while also innovatively generating voxel-based trajectories for each sub-task. The generated trajectories serve as a crucial conditioning factor, effectively steering the diffusion policy and substantially enhancing its performance. The proposed Trajectory-conditioned Diffusion Policy (TDP) is trained on trajectories derived from demonstration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
