Enhancing Multimodal Continual Instruction Tuning with BranchLoRA

Duzhen Zhang; Yong Ren; Zhong-Zhi Li; Yahan Yu; Jiahua Dong; Chenxing Li; Zhilong Ji; Jinfeng Bai

arXiv:2506.02041·cs.CL·June 4, 2025

Enhancing Multimodal Continual Instruction Tuning with BranchLoRA

Duzhen Zhang, Yong Ren, Zhong-Zhi Li, Yahan Yu, Jiahua Dong, Chenxing Li, Zhilong Ji, Jinfeng Bai

PDF

Open Access 1 Video

TL;DR

This paper introduces BranchLoRA, a novel framework for multimodal continual instruction tuning that improves efficiency and mitigates catastrophic forgetting by using task-specific branches and routers, outperforming previous methods.

Contribution

BranchLoRA offers an asymmetric, tuning-freezing, and task-aware approach to enhance continual instruction tuning of multimodal models, addressing inefficiencies and forgetting issues in prior frameworks.

Findings

01

Outperforms MoELoRA on MCIT benchmarks across various model sizes.

02

Effectively mitigates catastrophic forgetting with task-specific routing.

03

Maintains high performance without requiring task identity during inference.

Abstract

Multimodal Continual Instruction Tuning (MCIT) aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent across sequential tasks. Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA framework to preserve previous instruction alignments. However, these methods are prone to Catastrophic Forgetting (CF), as they aggregate all LoRA blocks via simple summation, which compromises performance over time. In this paper, we identify a critical parameter inefficiency in the MoELoRA framework within the MCIT context. Based on this insight, we propose BranchLoRA, an asymmetric framework to enhance both efficiency and performance. To mitigate CF, we introduce a flexible tuning-freezing mechanism within BranchLoRA, enabling branches to specialize in intra-task knowledge while fostering inter-task collaboration. Moreover, we incrementally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enhancing Multimodal Continual Instruction Tuning with BranchLoRA· underline

Taxonomy

TopicsSpeech and dialogue systems

MethodsALIGN