Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous   Environments

Kehan Chen; Dong An; Yan Huang; Rongtao Xu; Yifei Su; Yonggen Ling,; Ian Reid; Liang Wang

arXiv:2412.10137·cs.RO·April 16, 2025

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

Kehan Chen, Dong An, Yan Huang, Rongtao Xu, Yifei Su, Yonggen Ling,, Ian Reid, Liang Wang

PDF

Open Access

TL;DR

This paper introduces CA-Nav, a novel constraint-aware approach for zero-shot vision-language navigation in continuous environments, enabling robots to navigate effectively without prior demonstrations by decomposing instructions and dynamically planning paths.

Contribution

The paper proposes a new constraint-aware framework with modules for sub-instruction management and value mapping, advancing zero-shot VLN-CE performance and stability.

Findings

01

Achieves 12-13% higher success rates on benchmarks.

02

Outperforms previous methods in real-world robot tests.

03

Demonstrates effective navigation without prior training data.

Abstract

We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint-aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM's constraints, generates a value map on the fly and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques