ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models
Zhichen Lou, Kechun Xu, Zhongxiang Zhou, Rong Xiong

TL;DR
ExploreVLM introduces a closed-loop, vision-language model-based framework for robot exploration that enables real-time plan adaptation and improved perception in dynamic environments.
Contribution
It presents a novel closed-loop task planning system with self-reflection and structured scene understanding, advancing interactive exploration capabilities of robots.
Findings
Outperforms state-of-the-art baselines in exploration tasks
Enables real-time plan adjustment through feedback mechanisms
Improves perception accuracy with structured scene representations
Abstract
The advancement of embodied intelligence is accelerating the integration of robots into daily life as human assistants. This evolution requires robots to not only interpret high-level instructions and plan tasks but also perceive and adapt within dynamic environments. Vision-Language Models (VLMs) present a promising solution by combining visual understanding and language reasoning. However, existing VLM-based methods struggle with interactive exploration, accurate perception, and real-time plan adaptation. To address these challenges, we propose ExploreVLM, a novel closed-loop task planning framework powered by Vision-Language Models (VLMs). The framework is built around a step-wise feedback mechanism that enables real-time plan adjustment and supports interactive exploration. At its core is a dual-stage task planner with self-reflection, enhanced by an object-centric spatial relation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotic Path Planning Algorithms
