Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments
Kehan Chen, Dong An, Yan Huang, Rongtao Xu, Yifei Su, Yonggen Ling,, Ian Reid, Liang Wang

TL;DR
This paper introduces CA-Nav, a novel constraint-aware approach for zero-shot vision-language navigation in continuous environments, enabling robots to navigate effectively without prior demonstrations by decomposing instructions and dynamically planning paths.
Contribution
The paper proposes a new constraint-aware framework with modules for sub-instruction management and value mapping, advancing zero-shot VLN-CE performance and stability.
Findings
Achieves 12-13% higher success rates on benchmarks.
Outperforms previous methods in real-world robot tests.
Demonstrates effective navigation without prior training data.
Abstract
We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint-aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM's constraints, generates a value map on the fly and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques
