Routing-Based Continual Learning for Multimodal Large Language Models
Jay Mohta, Kenan Emir Ak, Gwang Lee, Dimitrios Dimitriadis, Yan Xu, Mingwei Shen

TL;DR
This paper presents a routing-based architecture for multimodal large language models that mitigates catastrophic forgetting, maintains fixed computational costs, and enables cross-modal transfer during continual learning.
Contribution
Introduces a routing approach that preserves knowledge, scales efficiently, and enhances cross-modal transfer in multimodal large language models during continual learning.
Findings
Routing maintains performance comparable to multi-task learning.
Method scales well with large expert pools and task relatedness.
Larger models show minimal degradation with the proposed approach.
Abstract
Multimodal Large Language Models (MLLMs) struggle with continual learning, often suffering from catastrophic forgetting when adapting to sequential tasks. We introduce a routing-based architecture that integrates new capabilities while robustly preserving foundational knowledge. While Multi-Task Learning (MTL) offers a theoretical performance upper bound, it incurs a linearly scaling computational overhead as the number of tasks increases. In contrast, our method maintains fixed data and compute requirements regardless of the task sequence length. Across models ranging from 2B to 8B parameters, we demonstrate that our routing approach performs on par with MTL while retaining the training efficiency of sequential fine-tuning. Beyond merely mitigating forgetting, we observe that token-level routing facilitates cross-modal transfer, leveraging knowledge from one modality to bolster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
