SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
Jiwen Zhang, Zejun Li, Siyuan Wang, Xiangyu Shi, Zhongyu Wei, Qi Wu

TL;DR
SpatialNav introduces a novel approach for zero-shot vision-and-language navigation by explicitly modeling global spatial structure with Spatial Scene Graphs, significantly improving navigation efficiency and performance without prior training.
Contribution
The paper presents SpatialNav, a zero-shot VLN agent that leverages Spatial Scene Graphs and a new spatial mapping strategy to enhance navigation in unseen environments.
Findings
SpatialNav outperforms existing zero-shot agents in various environments.
Global spatial representations improve navigation efficiency.
Results narrow the gap with learning-based methods.
Abstract
Although learning-based vision-and-language navigation (VLN) agents can learn spatial knowledge implicitly from large-scale training data, zero-shot VLN agents lack this process, relying primarily on local observations for navigation, which leads to inefficient exploration and a significant performance gap. To deal with the problem, we consider a zero-shot VLN setting that agents are allowed to fully explore the environment before task execution. Then, we construct the Spatial Scene Graph (SSG) to explicitly capture global spatial structure and semantics in the explored environment. Based on the SSG, we introduce SpatialNav, a zero-shot VLN agent that integrates an agent-centric spatial map, a compass-aligned visual representation, and a remote object localization strategy for efficient navigation. Comprehensive experiments in both discrete and continuous environments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
