Routing-Based Continual Learning for Multimodal Large Language Models

Jay Mohta; Kenan Emir Ak; Gwang Lee; Dimitrios Dimitriadis; Yan Xu; Mingwei Shen

arXiv:2511.01831·cs.LG·April 8, 2026

Routing-Based Continual Learning for Multimodal Large Language Models

Jay Mohta, Kenan Emir Ak, Gwang Lee, Dimitrios Dimitriadis, Yan Xu, Mingwei Shen

PDF

TL;DR

This paper presents a routing-based architecture for multimodal large language models that mitigates catastrophic forgetting, maintains fixed computational costs, and enables cross-modal transfer during continual learning.

Contribution

Introduces a routing approach that preserves knowledge, scales efficiently, and enhances cross-modal transfer in multimodal large language models during continual learning.

Findings

01

Routing maintains performance comparable to multi-task learning.

02

Method scales well with large expert pools and task relatedness.

03

Larger models show minimal degradation with the proposed approach.

Abstract

Multimodal Large Language Models (MLLMs) struggle with continual learning, often suffering from catastrophic forgetting when adapting to sequential tasks. We introduce a routing-based architecture that integrates new capabilities while robustly preserving foundational knowledge. While Multi-Task Learning (MTL) offers a theoretical performance upper bound, it incurs a linearly scaling computational overhead as the number of tasks increases. In contrast, our method maintains fixed data and compute requirements regardless of the task sequence length. Across models ranging from 2B to 8B parameters, we demonstrate that our routing approach performs on par with MTL while retaining the training efficiency of sequential fine-tuning. Beyond merely mitigating forgetting, we observe that token-level routing facilitates cross-modal transfer, leveraging knowledge from one modality to bolster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.