Vision-Language Navigation with Continual Learning
Zhiyuan Li, Yanfeng Lv, Ziqin Tu, Di Shang, Hong Qiao

TL;DR
This paper introduces a continual learning framework for vision-language navigation agents, enabling them to adapt to new environments efficiently while retaining prior knowledge, through a novel replay mechanism and memory management.
Contribution
It pioneers continual learning in VLN, proposing a dual-loop replay method and multi-scenario memory buffer to improve adaptation and knowledge retention.
Findings
Achieves state-of-the-art continual learning performance in VLN.
Enhances agent adaptability to new environments.
Reduces catastrophic forgetting in VLN agents.
Abstract
Vision-language navigation (VLN) is a critical domain within embedded intelligence, requiring agents to navigate 3D environments based on natural language instructions. Traditional VLN research has focused on improving environmental understanding and decision accuracy. However, these approaches often exhibit a significant performance gap when agents are deployed in novel environments, mainly due to the limited diversity of training data. Expanding datasets to cover a broader range of environments is impractical and costly. We propose the Vision-Language Navigation with Continual Learning (VLNCL) paradigm to address this challenge. In this paradigm, agents incrementally learn new environments while retaining previously acquired knowledge. VLNCL enables agents to maintain an environmental memory and extract relevant knowledge, allowing rapid adaptation to new environments while preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Speech and dialogue systems
