Nipping the Drift in the Bud: Retrospective Rectification for Robust Vision-Language Navigation
Gang He, Zhenyang Liu, Kepeng Xu, Li Xu, Tong Qiao, Wenxin Yu, Chang Wu, Weiying Xie

TL;DR
This paper introduces BudVLN, an online learning framework for vision-language navigation that mitigates distribution shift and improves navigation success by retrospective rectification and semantic consistency.
Contribution
BudVLN is a novel online approach that constructs supervision from current state distributions, addressing instruction-state misalignment in VLN.
Findings
Achieves state-of-the-art success rate and SPL on R2R-CE and RxR-CE benchmarks.
Effectively mitigates distribution shift in vision-language navigation.
Outperforms existing imitation learning methods in complex environments.
Abstract
Vision-Language Navigation (VLN) requires embodied agents to interpret natural language instructions and navigate through complex continuous 3D environments. However, the dominant imitation learning paradigm suffers from exposure bias, where minor deviations during inference lead to compounding errors. While DAgger-style approaches attempt to mitigate this by correcting error states, we identify a critical limitation: Instruction-State Misalignment. Forcing an agent to learn recovery actions from off-track states often creates supervision signals that semantically conflict with the original instruction. In response to these challenges, we introduce BudVLN, an online framework that learns from on-policy rollouts by constructing supervision to match the current state distribution. BudVLN performs retrospective rectification via counterfactual re-anchoring and decision-conditioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
