ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation

Feng Wu; Wei Zuo; Wenliang Yang; Jun Xiao; Yang Liu; Xinhua Zeng

arXiv:2603.26788·cs.RO·April 8, 2026

ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation

Feng Wu, Wei Zuo, Wenliang Yang, Jun Xiao, Yang Liu, Xinhua Zeng

PDF

TL;DR

ReMemNav is a hierarchical framework that enhances zero-shot object navigation by integrating panoramic semantic priors, episodic memory, and vision-language models to improve success rates and exploration efficiency.

Contribution

It introduces a novel memory-augmented hierarchical framework with a dual-modal rethinking mechanism and a new spatial reasoning model for zero-shot object navigation.

Findings

01

ReMemNav outperforms existing zero-shot baselines in success rate and exploration efficiency.

02

Achieves 1.7%-18.2% improvements in success rate across datasets.

03

Demonstrates significant improvements in success rate and SPL metrics.

Abstract

Zero-shot object navigation requires agents to locate unseen target objects in unfamiliar environments without prior maps or task-specific training which remains a significant challenge. Although recent advancements in vision-language models(VLMs) provide promising commonsense reasoning capabilities for this task, these models still suffer from spatial hallucinations, local exploration deadlocks, and a disconnect between high-level semantic intent and low-level control. In this regard, we propose a novel hierarchical navigation framework named ReMemNav, which seamlessly integrates panoramic semantic priors and episodic memory with VLMs. We introduce the Recognize Anything Model to anchor the spatial reasoning process of the VLM. We also design an adaptive dual-modal rethinking mechanism based on an episodic semantic buffer queue. The proposed mechanism actively verifies target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.