SwitchCIT: Switching for Continual Instruction Tuning
Xinbo Wu, Max Hartman, Vidhata Arjun Jayaraman, Lav R. Varshney

TL;DR
SwitchCIT introduces a switching mechanism to mitigate catastrophic forgetting in continual instruction tuning of large models, enhancing efficiency, scalability, and task adaptability across language and vision-language tasks.
Contribution
The paper proposes a novel switching approach for continual instruction tuning that reduces forgetting and improves model adaptability with parameter-efficient tuning.
Findings
Effective in reducing catastrophic forgetting
Improves efficiency and scalability
Applicable to language and vision-language tasks
Abstract
Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains, ensuring their effectiveness and relevance across a wide range of applications. In the context of continual instruction tuning, where models are sequentially trained on different tasks, catastrophic forgetting can occur, leading to performance degradation on previously learned tasks. This work addresses the catastrophic forgetting in continual instruction learning through a switching mechanism for routing computations to parameter-efficient tuned models. We demonstrate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
