MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding
Weifan Zhang, Tingguang Li, Yuzhen Liu

TL;DR
This paper introduces MAG-Nav, a novel language-driven object navigation system that uses active grounding and memory mechanisms to improve robot navigation in unknown environments without requiring training.
Contribution
We propose a zero-shot navigation framework leveraging memory and active perception, enhancing visual grounding and generalization in complex environments.
Findings
Outperforms state-of-the-art in Habitat-Matterport 3D benchmarks.
Demonstrates effective real-world deployment on a quadruped robot.
Achieves robust navigation without model fine-tuning.
Abstract
Visual navigation in unknown environments based solely on natural language descriptions is a key capability for intelligent robots. In this work, we propose a navigation framework built upon off-the-shelf Visual Language Models (VLMs), enhanced with two human-inspired mechanisms: perspective-based active grounding, which dynamically adjusts the robot's viewpoint for improved visual inspection, and historical memory backtracking, which enables the system to retain and re-evaluate uncertain observations over time. Unlike existing approaches that passively rely on incidental visual inputs, our method actively optimizes perception and leverages memory to resolve ambiguity, significantly improving vision-language grounding in complex, unseen environments. Our framework operates in a zero-shot manner, achieving strong generalization to diverse and open-ended language descriptions without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
