MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation

Lingfeng Zhang; Xiaoshuai Hao; Qinwen Xu; Qiang Zhang; Xinyao Zhang; Pengwei Wang; Jing Zhang; Zhongyuan Wang; Shanghang Zhang; Renjing Xu

arXiv:2502.13451·cs.RO·May 12, 2026

MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation

Lingfeng Zhang, Xiaoshuai Hao, Qinwen Xu, Qiang Zhang, Xinyao Zhang, Pengwei Wang, Jing Zhang, Zhongyuan Wang, Shanghang Zhang, Renjing Xu

PDF

TL;DR

MapNav introduces an end-to-end VLN model using Annotated Semantic Maps to replace historical observations, improving navigation accuracy and efficiency in diverse environments.

Contribution

The paper presents a novel ASM-based memory representation for VLN, enhancing object mapping and navigation cues, and achieves state-of-the-art results.

Findings

01

MapNav outperforms previous models in simulated environments.

02

The ASM approach reduces storage and computational overhead.

03

Code and dataset will be publicly released for reproducibility.

Abstract

Vision-and-language navigation (VLN) is a key task in Embodied AI, requiring agents to navigate diverse and unseen environments while following natural language instructions. Traditional approaches rely heavily on historical observations as spatio-temporal contexts for decision making, leading to significant storage and computational overhead. In this paper, we introduce MapNav, a novel end-to-end VLN model that leverages Annotated Semantic Map (ASM) to replace historical frames. Specifically, our approach constructs a top-down semantic map at the start of each episode and update it at each timestep, allowing for precise object mapping and structured navigation information. Then, we enhance this map with explicit textual labels for key regions, transforming abstract semantics into clear navigation cues and generate our ASM. MapNav agent using the constructed ASM as input, and use the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.