SPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language Navigation
Jiahang Liu, Tianyu Xu, Jiawei Chen, Lu Yue, Jiazhao Zhang, Zhiyong Wang, Minghan Li, Qisheng Zhao, Anqi Li, Qi Su, Zhizheng Zhang, He Wang

TL;DR
SPAN-Nav introduces a universal 3D spatial awareness model for vision-language navigation, leveraging a compact spatial prior representation and multi-task training to improve generalization in complex environments.
Contribution
It presents SPAN-Nav, a novel end-to-end foundation model that infuses embodied navigation with universal 3D spatial awareness using a single spatial token and extensive occupancy data.
Findings
Achieves state-of-the-art results across multiple navigation benchmarks.
Demonstrates robust generalization in real-world complex scenarios.
Utilizes a massive dataset of 4.2 million occupancy annotations.
Abstract
Recent embodied navigation approaches leveraging Vision-Language Models (VLMs) demonstrate strong generalization in versatile Vision-Language Navigation (VLN). However, reliable path planning in complex environments remains challenging due to insufficient spatial awareness. In this work, we introduce SPAN-Nav, an end-to-end foundation model designed to infuse embodied navigation with universal 3D spatial awareness using RGB video streams. SPAN-Nav extracts spatial priors across diverse scenes through an occupancy prediction task on extensive indoor and outdoor environments. To mitigate the computational burden, we introduce a compact representation for spatial priors, finding that a single token is sufficient to encapsulate the coarse-grained cues essential for navigation tasks. Furthermore, inspired by the Chain-of-Thought (CoT) mechanism, SPAN-Nav utilizes this single spatial token to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
