History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation

Mobin Habibpour; Fatemeh Afghah

arXiv:2506.16623·cs.RO·June 23, 2025

History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation

Mobin Habibpour, Fatemeh Afghah

PDF

Open Access

TL;DR

This paper presents a history-aware, zero-shot object navigation framework that enhances vision-language reasoning with dynamic prompting and waypoint generation, significantly improving navigation success in unseen environments.

Contribution

It introduces a novel history-augmented prompting method for vision-language models, enabling deeper reasoning and better navigation performance in zero-shot settings.

Findings

01

Achieves 46% success rate on HM3D dataset

02

Improves navigation robustness with history-aware prompts

03

Comparable to state-of-the-art zero-shot methods

Abstract

Object Goal Navigation (ObjectNav) challenges robots to find objects in unseen environments, demanding sophisticated reasoning. While Vision-Language Models (VLMs) show potential, current ObjectNav methods often employ them superficially, primarily using vision-language embeddings for object-scene similarity checks rather than leveraging deeper reasoning. This limits contextual understanding and leads to practical issues like repetitive navigation behaviors. This paper introduces a novel zero-shot ObjectNav framework that pioneers the use of dynamic, history-aware prompting to more deeply integrate VLM reasoning into frontier-based exploration. Our core innovation lies in providing the VLM with action history context, enabling it to generate semantic guidance scores for navigation actions while actively avoiding decision loops. We also introduce a VLM-assisted waypoint generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robotic Path Planning Algorithms