Direct Contact-Tolerant Motion Planning With Vision Language Models

He Li; Jian Sun; Chengyang Li; Guoliang Li; Qiyu Ruan; Shuai Wang; Chengzhong Xu

arXiv:2603.05017·cs.RO·March 6, 2026

Direct Contact-Tolerant Motion Planning With Vision Language Models

He Li, Jian Sun, Chengyang Li, Guoliang Li, Qiyu Ruan, Shuai Wang, Chengzhong Xu

PDF

Open Access

TL;DR

This paper introduces a novel direct contact-tolerant motion planning approach that leverages vision-language models for perception, enabling robots to navigate cluttered environments with movable obstacles more robustly and efficiently.

Contribution

The paper presents a new DCT planner integrating VLMs for contact reasoning and a perception-to-control framework, improving upon traditional indirect spatial representations.

Findings

01

DCT outperforms baseline methods in cluttered environments.

02

Robust navigation demonstrated on real robot and simulation.

03

Effective contact-tolerance reasoning using VLMs.

Abstract

Navigation in cluttered environments often requires robots to tolerate contact with movable or deformable objects to maintain efficiency. Existing contact-tolerant motion planning (CTMP) methods rely on indirect spatial representations (e.g., prebuilt map, obstacle set), resulting in inaccuracies and a lack of adaptiveness to environmental uncertainties. To address this issue, we propose a direct contact-tolerant (DCT) planner, which integrates vision-language models (VLMs) into direct point perception and navigation, including two key components. The first one is VLM point cloud partitioner (VPP), which performs contact-tolerance reasoning in image space using VLM, caches inference masks, propagates them across frames using odometry, and projects them onto the current scan to generate a contact-aware point cloud. The second innovation is VPP guided navigation (VGN), which formulates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications