ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation
Feng Wu, Wei Zuo, Wenliang Yang, Jun Xiao, Yang Liu, Xinhua Zeng

TL;DR
ReMemNav is a hierarchical framework that enhances zero-shot object navigation by integrating panoramic semantic priors, episodic memory, and vision-language models to improve success rates and exploration efficiency.
Contribution
It introduces a novel memory-augmented hierarchical framework with a dual-modal rethinking mechanism and a new spatial reasoning model for zero-shot object navigation.
Findings
ReMemNav outperforms existing zero-shot baselines in success rate and exploration efficiency.
Achieves 1.7%-18.2% improvements in success rate across datasets.
Demonstrates significant improvements in success rate and SPL metrics.
Abstract
Zero-shot object navigation requires agents to locate unseen target objects in unfamiliar environments without prior maps or task-specific training which remains a significant challenge. Although recent advancements in vision-language models(VLMs) provide promising commonsense reasoning capabilities for this task, these models still suffer from spatial hallucinations, local exploration deadlocks, and a disconnect between high-level semantic intent and low-level control. In this regard, we propose a novel hierarchical navigation framework named ReMemNav, which seamlessly integrates panoramic semantic priors and episodic memory with VLMs. We introduce the Recognize Anything Model to anchor the spatial reasoning process of the VLM. We also design an adaptive dual-modal rethinking mechanism based on an episodic semantic buffer queue. The proposed mechanism actively verifies target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
