SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning

Zhen-Hao Xie; Jun-Tao Tang; Yu-Cheng Shi; Han-Jia Ye; De-Chuan Zhan; Da-Wei Zhou

arXiv:2602.01990·cs.LG·February 3, 2026

SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning

Zhen-Hao Xie, Jun-Tao Tang, Yu-Cheng Shi, Han-Jia Ye, De-Chuan Zhan, Da-Wei Zhou

PDF

Open Access

TL;DR

SAME introduces a stabilized mixture-of-experts approach for multimodal continual instruction tuning, addressing expert and router drift to improve task retention and model performance in evolving data environments.

Contribution

It proposes a novel stabilization method for expert routing and update regulation, enhancing continual learning in multimodal models without rehearsal.

Findings

01

Achieves state-of-the-art results on multimodal continual instruction benchmarks.

02

Effectively mitigates expert and router drift in dynamic data scenarios.

03

Reduces redundant computation through adaptive expert activation.

Abstract

Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually expand their capabilities, making Multimodal Continual Instruction Tuning (MCIT) essential. Recent methods leverage sparse expert routing to promote task specialization, but we find that the expert routing process suffers from drift as the data distribution evolves. For example, a grounding query that previously activated localization experts may instead be routed to irrelevant experts after learning OCR tasks. Meanwhile, the grounding-related experts can be overwritten by new tasks and lose their original functionality. Such failure reflects two problems: router drift, where expert selection becomes inconsistent over time, and expert drift, where shared experts are overwritten across tasks. Therefore, we propose StAbilized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Advanced Neural Network Applications