MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding

Weifan Zhang; Tingguang Li; Yuzhen Liu

arXiv:2508.05021·cs.RO·August 8, 2025

MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding

Weifan Zhang, Tingguang Li, Yuzhen Liu

PDF

TL;DR

This paper introduces MAG-Nav, a novel language-driven object navigation system that uses active grounding and memory mechanisms to improve robot navigation in unknown environments without requiring training.

Contribution

We propose a zero-shot navigation framework leveraging memory and active perception, enhancing visual grounding and generalization in complex environments.

Findings

01

Outperforms state-of-the-art in Habitat-Matterport 3D benchmarks.

02

Demonstrates effective real-world deployment on a quadruped robot.

03

Achieves robust navigation without model fine-tuning.

Abstract

Visual navigation in unknown environments based solely on natural language descriptions is a key capability for intelligent robots. In this work, we propose a navigation framework built upon off-the-shelf Visual Language Models (VLMs), enhanced with two human-inspired mechanisms: perspective-based active grounding, which dynamically adjusts the robot's viewpoint for improved visual inspection, and historical memory backtracking, which enables the system to retain and re-evaluate uncertain observations over time. Unlike existing approaches that passively rely on incidental visual inputs, our method actively optimizes perception and leverages memory to resolve ambiguity, significantly improving vision-language grounding in complex, unseen environments. Our framework operates in a zero-shot manner, achieving strong generalization to diverse and open-ended language descriptions without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.