Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

Kangyi Wu; Pengna Li; Kailin Lyu; Xi Lin; Lin Zhao; Qingrong He; Jinjun Wang; Jianyi Liu

arXiv:2604.17473·cs.CV·May 22, 2026

Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

Kangyi Wu, Pengna Li, Kailin Lyu, Xi Lin, Lin Zhao, Qingrong He, Jinjun Wang, Jianyi Liu

PDF

1 Repo

TL;DR

This paper introduces Dual-Anchoring, a framework that improves vision-language navigation by explicitly addressing progress and memory drift, leading to significant performance gains in complex environments.

Contribution

The paper proposes a novel Dual-Anchoring Framework with instruction progress and memory landmark anchoring, along with large datasets for training and evaluation.

Findings

01

Achieved 15.2% improvement in Success Rate.

02

Gained 24.7% on long-horizon trajectories.

03

Demonstrated effectiveness in both simulation and real-world environments.

Abstract

Vision-Language Navigation(VLN) requires an agent to navigate through 3D environments by following natural language instructions. While recent Video Large Language Models(Video-LLMs) have largely advanced VLN, they remain highly susceptible to State Drift in long scenarios. In these cases, the agent's internal state drifts away from the true task execution state, leading to aimless wandering and failure to execute essential maneuvers in the instruction. We attribute this failure to two distinct cognitive deficits: Progress Drift, where the agent fails to distinguish completed sub-goals from remaining ones, and Memory Drift, where the agent's history representations degrade, making it lose track of visited landmarks. In this paper, we propose a Dual-Anchoring Framework that explicitly anchors the instruction progress and history representations. First, to address progress drift, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.