Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration
Lu Yue, Yue Fan, Shiwei Lian, Yu Zhao, Jiaxin Yu, Liang Xie, Feitian Zhang

TL;DR
Spatial-VLN introduces a perception-guided exploration framework that enhances zero-shot vision-and-language navigation by explicitly addressing spatial perception challenges, leading to improved generalization and robustness in complex environments.
Contribution
The paper proposes Spatial-VLN, a novel framework with modules for spatial perception enhancement and multi-expert reasoning, significantly improving zero-shot VLN performance.
Findings
Achieves state-of-the-art results on VLN-CE with low-cost LLMs.
Demonstrates superior generalization and robustness in real-world environments.
Effectively bridges the Sim2Real gap with a new waypoint sampling strategy.
Abstract
Zero-shot Vision-and-Language Navigation (VLN) agents leveraging Large Language Models (LLMs) excel in generalization but suffer from insufficient spatial perception. Focusing on complex continuous environments, we categorize key perceptual bottlenecks into three spatial challenges: door interaction,multi-room navigation, and ambiguous instruction execution, where existing methods consistently suffer high failure rates. We present Spatial-VLN, a perception-guided exploration framework designed to overcome these challenges. The framework consists of two main modules. The Spatial Perception Enhancement (SPE) module integrates panoramic filtering with specialized door and region experts to produce spatially coherent, cross-view consistent perceptual representations. Building on this foundation, our Explored Multi-expert Reasoning (EMR) module uses parallel LLM experts to address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling
