Continual Instruction Tuning for Large Multimodal Models

Jinghan He; Haiyun Guo; Ming Tang; Jinqiao Wang

arXiv:2311.16206·cs.LG·November 29, 2023·1 cites

Continual Instruction Tuning for Large Multimodal Models

Jinghan He, Haiyun Guo, Ming Tang, Jinqiao Wang

PDF

Open Access

TL;DR

This paper investigates continual instruction tuning for large multimodal models, revealing persistent catastrophic forgetting and proposing methods to mitigate it, thereby enhancing model adaptability to evolving vision-language tasks.

Contribution

It introduces the first benchmark for continual instruction tuning of LMMs, analyzes forgetting dynamics, and adapts classic continual learning methods to improve performance.

Findings

01

Catastrophic forgetting persists in continual instruction tuning of LMMs.

02

Multi-task joint instruction tuning helps mitigate forgetting.

03

Data replay and model expansion strategies are effective in this context.

Abstract

Instruction tuning is now a widely adopted approach to aligning large multimodal models (LMMs) to follow human intent. It unifies the data format of vision-language tasks, enabling multi-task joint training. However, vision-language tasks are constantly being created in practice. Instead of always re-training LMMs when new tasks arrive, continual learning offers flexibility for models to continually and efficiently exploit the evolving data. This work aims to explore the following two questions: 1) Do LMMs still suffer from catastrophic forgetting in continual instruction tuning? 2) Are the existing three classes of continual learning methods still applicable to the continual instruction tuning of LMMs? An extensive study is conducted to address the above questions. First, we establish the first benchmark in this setting and reveal that catastrophic forgetting is still observed when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition