DV-VLN: Dual Verification for Reliable LLM-Based Vision-and-Language Navigation
Zijun Li, Shijie Li, Zhenxi Zhang, Bin Li, and Shoujun Zhou

TL;DR
DV-VLN introduces a dual verification framework for vision-and-language navigation that enhances decision reliability by verifying candidate actions through two complementary channels, improving performance and interpretability in complex environments.
Contribution
The paper proposes a generate-then-verify VLN framework using dual verification channels, advancing the reliability and interpretability of LLM-based navigation agents.
Findings
Consistently outperforms direct prediction baselines.
Achieves competitive results with language-only VLN agents.
Shows promising performance compared to cross-modal systems.
Abstract
Vision-and-Language Navigation (VLN) requires an embodied agent to navigate in a complex 3D environment according to natural language instructions. Recent progress in large language models (LLMs) has enabled language-driven navigation with improved interpretability. However, most LLM-based agents still rely on single-shot action decisions, where the model must choose one option from noisy, textualized multi-perspective observations. Due to local mismatches and imperfect intermediate reasoning, such decisions can easily deviate from the correct path, leading to error accumulation and reduced reliability in unseen environments. In this paper, we propose DV-VLN, a new VLN framework that follows a generate-then-verify paradigm. DV-VLN first performs parameter-efficient in-domain adaptation of an open-source LLaMA-2 backbone to produce a structured navigational chain-of-thought, and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
