Vision-Language Navigation with Continual Learning

Zhiyuan Li; Yanfeng Lv; Ziqin Tu; Di Shang; Hong Qiao

arXiv:2409.02561·cs.AI·September 24, 2024

Vision-Language Navigation with Continual Learning

Zhiyuan Li, Yanfeng Lv, Ziqin Tu, Di Shang, Hong Qiao

PDF

Open Access

TL;DR

This paper introduces a continual learning framework for vision-language navigation agents, enabling them to adapt to new environments efficiently while retaining prior knowledge, through a novel replay mechanism and memory management.

Contribution

It pioneers continual learning in VLN, proposing a dual-loop replay method and multi-scenario memory buffer to improve adaptation and knowledge retention.

Findings

01

Achieves state-of-the-art continual learning performance in VLN.

02

Enhances agent adaptability to new environments.

03

Reduces catastrophic forgetting in VLN agents.

Abstract

Vision-language navigation (VLN) is a critical domain within embedded intelligence, requiring agents to navigate 3D environments based on natural language instructions. Traditional VLN research has focused on improving environmental understanding and decision accuracy. However, these approaches often exhibit a significant performance gap when agents are deployed in novel environments, mainly due to the limited diversity of training data. Expanding datasets to cover a broader range of environments is impractical and costly. We propose the Vision-Language Navigation with Continual Learning (VLNCL) paradigm to address this challenge. In this paradigm, agents incrementally learn new environments while retaining previously acquired knowledge. VLNCL enables agents to maintain an environmental memory and extract relevant knowledge, allowing rapid adaptation to new environments while preserving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Speech and dialogue systems