SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen, Runmin Cong, Yuzhi Zhao, Hongzheng Yang, Guangneng Hu,, Horace Ho Shing Ip, Sam Kwong

TL;DR
This paper introduces SEFE, a method combining Answer Style Diversification and RegLoRA to mitigate superficial and essential forgetting in multimodal continual instruction tuning, achieving state-of-the-art results.
Contribution
It proposes a novel framework that distinguishes and addresses superficial and essential forgetting in multimodal models, with new techniques for style diversification and parameter regularization.
Findings
SEFE outperforms existing methods on benchmark tasks.
The Answer Style Diversification effectively prevents superficial forgetting.
RegLoRA stabilizes key parameters, reducing essential forgetting.
Abstract
Multimodal Continual Instruction Tuning (MCIT) aims to enable Multimodal Large Language Models (MLLMs) to incrementally learn new tasks without catastrophic forgetting. In this paper, we explore forgetting in this context, categorizing it into superficial forgetting and essential forgetting. Superficial forgetting refers to cases where the model's knowledge may not be genuinely lost, but its responses to previous tasks deviate from expected formats due to the influence of subsequent tasks' answer styles, making the results unusable. By contrast, essential forgetting refers to situations where the model provides correctly formatted but factually inaccurate answers, indicating a true loss of knowledge. Assessing essential forgetting necessitates addressing superficial forgetting first, as severe superficial forgetting can obscure the model's knowledge state. Hence, we first introduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Speech and Audio Processing · Speech Recognition and Synthesis
