SwitchCIT: Switching for Continual Instruction Tuning

Xinbo Wu; Max Hartman; Vidhata Arjun Jayaraman; Lav R. Varshney

arXiv:2407.11780·cs.CL·December 19, 2024

SwitchCIT: Switching for Continual Instruction Tuning

Xinbo Wu, Max Hartman, Vidhata Arjun Jayaraman, Lav R. Varshney

PDF

Open Access

TL;DR

SwitchCIT introduces a switching mechanism to mitigate catastrophic forgetting in continual instruction tuning of large models, enhancing efficiency, scalability, and task adaptability across language and vision-language tasks.

Contribution

The paper proposes a novel switching approach for continual instruction tuning that reduces forgetting and improves model adaptability with parameter-efficient tuning.

Findings

01

Effective in reducing catastrophic forgetting

02

Improves efficiency and scalability

03

Applicable to language and vision-language tasks

Abstract

Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains, ensuring their effectiveness and relevance across a wide range of applications. In the context of continual instruction tuning, where models are sequentially trained on different tasks, catastrophic forgetting can occur, leading to performance degradation on previously learned tasks. This work addresses the catastrophic forgetting in continual instruction learning through a switching mechanism for routing computations to parameter-efficient tuned models. We demonstrate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques