SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning
Ziqi Wang, Chang Che, Qi Wang, Yangyang Li, Zenglin Shi, Meng Wang

TL;DR
This paper introduces SMoLoRA, a novel framework with dual modules for visual understanding and instruction following, effectively mitigating dual catastrophic forgetting in continual visual instruction tuning of multimodal models.
Contribution
The paper proposes SMoLoRA, a dual-routing adaptation method that addresses dual forgetting in CVIT and introduces a new benchmark for evaluating generalization and instruction robustness.
Findings
SMoLoRA outperforms existing methods in mitigating dual forgetting.
It improves generalization to unseen tasks.
It enhances robustness in following diverse instructions.
Abstract
Visual instruction tuning (VIT) enables multimodal large language models (MLLMs) to effectively handle a wide range of vision tasks by framing them as language-based instructions. Building on this, continual visual instruction tuning (CVIT) extends the capability of MLLMs to incrementally learn new tasks, accommodating evolving functionalities. While prior work has advanced CVIT through the development of new benchmarks and approaches to mitigate catastrophic forgetting, these efforts largely follow traditional continual learning paradigms, neglecting the unique challenges specific to CVIT. We identify a dual form of catastrophic forgetting in CVIT, where MLLMs not only forget previously learned visual understanding but also experience a decline in instruction following abilities as they acquire new tasks. To address this, we introduce the Separable Mixture of Low-Rank Adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Image Processing Techniques and Applications
