ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Zhichen Lou; Kechun Xu; Zhongxiang Zhou; Rong Xiong

arXiv:2508.11918·cs.RO·August 19, 2025

ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models

Zhichen Lou, Kechun Xu, Zhongxiang Zhou, Rong Xiong

PDF

Open Access

TL;DR

ExploreVLM introduces a closed-loop, vision-language model-based framework for robot exploration that enables real-time plan adaptation and improved perception in dynamic environments.

Contribution

It presents a novel closed-loop task planning system with self-reflection and structured scene understanding, advancing interactive exploration capabilities of robots.

Findings

01

Outperforms state-of-the-art baselines in exploration tasks

02

Enables real-time plan adjustment through feedback mechanisms

03

Improves perception accuracy with structured scene representations

Abstract

The advancement of embodied intelligence is accelerating the integration of robots into daily life as human assistants. This evolution requires robots to not only interpret high-level instructions and plan tasks but also perceive and adapt within dynamic environments. Vision-Language Models (VLMs) present a promising solution by combining visual understanding and language reasoning. However, existing VLM-based methods struggle with interactive exploration, accurate perception, and real-time plan adaptation. To address these challenges, we propose ExploreVLM, a novel closed-loop task planning framework powered by Vision-Language Models (VLMs). The framework is built around a step-wise feedback mechanism that enables real-time plan adjustment and supports interactive exploration. At its core is a dual-stage task planner with self-reflection, enhanced by an object-centric spatial relation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotic Path Planning Algorithms