DV-VLN: Dual Verification for Reliable LLM-Based Vision-and-Language Navigation

Zijun Li; Shijie Li; Zhenxi Zhang; Bin Li; and Shoujun Zhou

arXiv:2601.18492·cs.RO·January 27, 2026

DV-VLN: Dual Verification for Reliable LLM-Based Vision-and-Language Navigation

Zijun Li, Shijie Li, Zhenxi Zhang, Bin Li, and Shoujun Zhou

PDF

Open Access

TL;DR

DV-VLN introduces a dual verification framework for vision-and-language navigation that enhances decision reliability by verifying candidate actions through two complementary channels, improving performance and interpretability in complex environments.

Contribution

The paper proposes a generate-then-verify VLN framework using dual verification channels, advancing the reliability and interpretability of LLM-based navigation agents.

Findings

01

Consistently outperforms direct prediction baselines.

02

Achieves competitive results with language-only VLN agents.

03

Shows promising performance compared to cross-modal systems.

Abstract

Vision-and-Language Navigation (VLN) requires an embodied agent to navigate in a complex 3D environment according to natural language instructions. Recent progress in large language models (LLMs) has enabled language-driven navigation with improved interpretability. However, most LLM-based agents still rely on single-shot action decisions, where the model must choose one option from noisy, textualized multi-perspective observations. Due to local mismatches and imperfect intermediate reasoning, such decisions can easily deviate from the correct path, leading to error accumulation and reduced reliability in unseen environments. In this paper, we propose DV-VLN, a new VLN framework that follows a generate-then-verify paradigm. DV-VLN first performs parameter-efficient in-domain adaptation of an open-source LLaMA-2 backbone to produce a structured navigational chain-of-thought, and then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques