History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation
Mobin Habibpour, Fatemeh Afghah

TL;DR
This paper presents a history-aware, zero-shot object navigation framework that enhances vision-language reasoning with dynamic prompting and waypoint generation, significantly improving navigation success in unseen environments.
Contribution
It introduces a novel history-augmented prompting method for vision-language models, enabling deeper reasoning and better navigation performance in zero-shot settings.
Findings
Achieves 46% success rate on HM3D dataset
Improves navigation robustness with history-aware prompts
Comparable to state-of-the-art zero-shot methods
Abstract
Object Goal Navigation (ObjectNav) challenges robots to find objects in unseen environments, demanding sophisticated reasoning. While Vision-Language Models (VLMs) show potential, current ObjectNav methods often employ them superficially, primarily using vision-language embeddings for object-scene similarity checks rather than leveraging deeper reasoning. This limits contextual understanding and leads to practical issues like repetitive navigation behaviors. This paper introduces a novel zero-shot ObjectNav framework that pioneers the use of dynamic, history-aware prompting to more deeply integrate VLM reasoning into frontier-based exploration. Our core innovation lies in providing the VLM with action history context, enabling it to generate semantic guidance scores for navigation actions while actively avoiding decision loops. We also introduce a VLM-assisted waypoint generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robotic Path Planning Algorithms
