DecoVLN: Decoupling Observation, Reasoning, and Correction for Vision-and-Language Navigation

Zihao Xin; Wentong Li; Yixuan Jiang; Bin Wang; Runmin Cong; Jie Qin; Shengjun Huang

arXiv:2603.13133·cs.RO·March 27, 2026

DecoVLN: Decoupling Observation, Reasoning, and Correction for Vision-and-Language Navigation

Zihao Xin, Wentong Li, Yixuan Jiang, Bin Wang, Runmin Cong, Jie Qin, Shengjun Huang

PDF

Open Access

TL;DR

DecoVLN introduces a novel framework for vision-and-language navigation that enhances long-term memory construction and error correction, leading to more robust and accurate navigation in complex environments.

Contribution

The paper proposes a new framework that decouples observation, reasoning, and correction, with adaptive memory refinement and state-action correction strategies for improved VLN performance.

Findings

01

Effective long-term memory optimization improves navigation accuracy.

02

State-action correction reduces compounding errors.

03

Real-world deployment demonstrates practical robustness.

Abstract

Vision-and-Language Navigation (VLN) requires agents to follow long-horizon instructions and navigate complex 3D environments. However, existing approaches face two major challenges: constructing an effective long-term memory bank and overcoming the compounding errors problem. To address these issues, we propose DecoVLN, an effective framework designed for robust streaming perception and closed-loop control in long-horizon navigation. First, we formulate long-term memory construction as an optimization problem and introduce adaptive refinement mechanism that selects frames from a historical candidate pool by iteratively optimizing a unified scoring function. This function jointly balances three key criteria: semantic relevance to the instruction, visual diversity from the selected memory, and temporal coverage of the historical trajectory. Second, to alleviate compounding errors, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robotics and Sensor-Based Localization