Enhancing Multimodal Continual Instruction Tuning with BranchLoRA
Duzhen Zhang, Yong Ren, Zhong-Zhi Li, Yahan Yu, Jiahua Dong, Chenxing Li, Zhilong Ji, Jinfeng Bai

TL;DR
This paper introduces BranchLoRA, a novel framework for multimodal continual instruction tuning that improves efficiency and mitigates catastrophic forgetting by using task-specific branches and routers, outperforming previous methods.
Contribution
BranchLoRA offers an asymmetric, tuning-freezing, and task-aware approach to enhance continual instruction tuning of multimodal models, addressing inefficiencies and forgetting issues in prior frameworks.
Findings
Outperforms MoELoRA on MCIT benchmarks across various model sizes.
Effectively mitigates catastrophic forgetting with task-specific routing.
Maintains high performance without requiring task identity during inference.
Abstract
Multimodal Continual Instruction Tuning (MCIT) aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent across sequential tasks. Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA framework to preserve previous instruction alignments. However, these methods are prone to Catastrophic Forgetting (CF), as they aggregate all LoRA blocks via simple summation, which compromises performance over time. In this paper, we identify a critical parameter inefficiency in the MoELoRA framework within the MCIT context. Based on this insight, we propose BranchLoRA, an asymmetric framework to enhance both efficiency and performance. To mitigate CF, we introduce a flexible tuning-freezing mechanism within BranchLoRA, enabling branches to specialize in intra-task knowledge while fostering inter-task collaboration. Moreover, we incrementally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech and dialogue systems
MethodsALIGN
