Continual LLaVA: Continual Instruction Tuning in Large Vision-Language   Models

Meng Cao; Yuyang Liu; Yingfei Liu; Tiancai Wang; Jiahua Dong; Henghui; Ding; Xiangyu Zhang; Ian Reid; Xiaodan Liang

arXiv:2411.02564·cs.CV·November 12, 2024

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models

Meng Cao, Yuyang Liu, Yingfei Liu, Tiancai Wang, Jiahua Dong, Henghui, Ding, Xiangyu Zhang, Ian Reid, Xiaodan Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Continual LLaVA, a rehearsal-free method for continual instruction tuning of large vision-language models, effectively handling evolving tasks while minimizing knowledge forgetting.

Contribution

It proposes a novel continual instruction tuning approach with a new benchmark, COAST, and a parameter-efficient method using dual increment embeddings without experience replay.

Findings

01

Outperforms previous methods in reducing forgetting

02

Effective in domain, capability, and dataset incremental settings

03

Maintains high performance across evolving tasks

Abstract

Instruction tuning constitutes a prevalent technique for tailoring Large Vision Language Models (LVLMs) to meet individual task requirements. To date, most of the existing approaches are confined to single-task adaptation, whereas the requirements in real-world scenarios are inherently varied and continually evolving. Thus an ideal LVLM should sustain continual instruction tuning in the face of stream-task distributions (i.e., different domains, emerging capabilities, and new datasets) while minimizing the forgetting of previously acquired knowledge. To achieve this, we propose a new benchmark for COntinuAl inStruction Tuning on LVLMs (COAST), which encompasses the aforementioned domain-incremental, capability-incremental, and dataset-incremental configurations. In terms of methodology, we propose Continual LLaVA, a rehearsal-free method tailored for continual instruction tuning in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mengcaopku/continual-llava
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Natural Language Processing Techniques

MethodsSparse Evolutionary Training