NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps

Dijia Zhan; Jinyi Li; Chenxi Zheng; Shaoyu Huang; Yong Li; Jie Tang; Xuemiao Xu

arXiv:2605.06317·cs.CV·May 19, 2026

NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps

Dijia Zhan, Jinyi Li, Chenxi Zheng, Shaoyu Huang, Yong Li, Jie Tang, Xuemiao Xu

PDF

TL;DR

NavOne introduces a one-step global planning approach for vision-language navigation on top-down maps, significantly improving efficiency and accuracy over previous step-by-step methods.

Contribution

The paper presents NavOne, a novel end-to-end framework for direct dense path prediction in top-down maps, advancing global spatial reasoning in VLN.

Findings

01

NavOne achieves state-of-the-art performance on R2R-TopDown dataset.

02

It provides an 8x speedup over existing map-based baselines.

03

It outperforms egocentric methods by 80x in planning speed.

Abstract

Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path planning problem on pre-built top-down maps, supported by our newly constructed R2R-TopDown dataset. To solve this, we introduce NavOne, a unified framework that directly predicts dense path probabilities over multi-modal maps in a single end-to-end forward pass. NavOne features a Top-Down Map Fuser for joint multi-modal map representation, and extends Attention Residuals for spatial-aware depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.