All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
Xudong Wang, Gan Li, Zhiyu Liu, Yao Wang, Lianqing Liu, Zhi Han

TL;DR
This paper introduces Tucker Adaptation (TuKA), a tensor-based method for lifelong vision-and-language navigation that effectively learns across multiple scenes without catastrophic forgetting, enabling all-day multi-scenes navigation.
Contribution
The paper proposes TuKA, a high-order tensor adaptation method, and a lifelong learning strategy for VLN agents, advancing multi-scene navigation capabilities.
Findings
AlldayWalker outperforms state-of-the-art baselines.
TuKA effectively captures multi-hierarchical knowledge.
The approach reduces catastrophic forgetting in lifelong VLN.
Abstract
Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong VLN (AML-VLN) problem. Existing parameter-efficient adapters (e.g., LoRA and its variants) are limited by their two-dimensional matrix form, which fails to capture the multi-hierarchical navigation knowledge spanning multiple scenes and environments. To address this, we propose Tucker Adaptation (TuKA), which represents the multi-hierarchical navigation knowledge as a high-order tensor and leverages Tucker decomposition to decouple the knowledge into shared subspaces and scenario-specific experts. We further introduce a decoupled knowledge incremental learning strategy to…
Peer Reviews
Decision·ICLR 2026 Poster
1. This article explores continual learning in vision-language navigation which is novel and important for future research. 2. The authors represent multi-hierarchical knowledge as high-order tensor and proposes an effective method to decouple task-shared and task-specific representations. The method could also successfully learn new task-specific knowledge incrementally without catastrophic forgetting. 3. Extensive experiments are conducted by the authors which reveals the effectiveness of thei
1. The two critical challenges presented in the introduction (Lines 75–78) appear very similar, which may cause confusion for readers. The authors are encouraged to clarify the distinction between them. 2. The authors collected only 24 scenes for the multi-scene lifelong VLN benchmark setting. Adding more scenes would make the results more convincing; otherwise, the authors should provide justification for why these 24 scenes are sufficient (such as the number of tested episodes). 3. This work i
* The paper introduces a parameter-efficient adaptation method (TuKA) that leverages Tucker decomposition to decouple and represent multi-hierarchical knowledge in a high-order tensor, enabling more expressive representation learning. * The integration of TuKA into a lifelong VLN agent (AlldayWalker) demonstrates the potential of high-order tensor adaptation for continual learning in navigation tasks. * The authors make meaningful modifications to existing VLN simulators, allowing more systema
* Line 76: The challenges (i) and (ii) appear to be repeated. Please clarify or consolidate to avoid redundancy. * The proposed method appears to directly apply Tucker decomposition to the VLN task. The paper should more clearly articulate what unique ideas or design choices provide additional contributions beyond standard Tucker-based factorization. * The paper omits relevant discussions of recent test-time adaptation approaches for VLN, such as FSTTA (ICML 2024) and FeedTTA (ICML 2025). In p
1. The paper is well-written and clearly organized, with a solid motivation and clear presentation of the continual learning setting in VLN. 2. Addresses the underexplored problem of continual learning in vision-and-language navigation — a task that naturally involves sequential adaptation yet has received limited attention in prior research. 3. Proposes TuKA, a Tucker-based adapter that decouples scene- and environment-specific experts. This structure enables task-wise adaptation while mitigati
1. The current experiments appear to evaluate tasks within the same building. Since a central goal of continual learning is to enable agents to transfer previously acquired knowledge to unseen environments, it would be valuable to include evaluations in completely new buildings or environments to more clearly demonstrate TuKA’s generalization ability beyond the trained domain. 2. Continual learning results can be sensitive to the order in which tasks are learned. It would be helpful to clarify w
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
