Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning
Mobin Habibpour, Fatemeh Afghah

TL;DR
This paper introduces a VLM-powered reasoning framework for zero-shot object-goal navigation, significantly improving navigation efficiency by leveraging high-level planning, contextual understanding, and spatial reasoning.
Contribution
It presents a novel approach that transforms VLMs into active strategists for robotic navigation, integrating reasoning, action history, and obstacle interpretation.
Findings
Achieves more direct and logical navigation trajectories.
Outperforms existing methods on HM3D, Gibson, and MP3D benchmarks.
Enhances spatial awareness through obstacle map interpretation.
Abstract
While Vision-Language Models (VLMs) are set to transform robotic navigation, existing methods often underutilize their reasoning capabilities. To unlock the full potential of VLMs in robotics, we shift their role from passive observers to active strategists in the navigation process. Our framework outsources high-level planning to a VLM, which leverages its contextual understanding to guide a frontier-based exploration agent. This intelligent guidance is achieved through a trio of techniques: structured chain-of-thought prompting that elicits logical, step-by-step reasoning; dynamic inclusion of the agent's recent action history to prevent getting stuck in loops; and a novel capability that enables the VLM to interpret top-down obstacle maps alongside first-person views, thereby enhancing spatial awareness. When tested on challenging benchmarks like HM3D, Gibson, and MP3D, this method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Social Robot Interaction and HRI
