MLLM-CL: Continual Learning for Multimodal Large Language Models

Hongbo Zhao; Fei Zhu; Haiyang Guo; Meng Wang; Rundong Wang; Gaofeng Meng; Zhaoxiang Zhang

arXiv:2506.05453·cs.CL·October 2, 2025

MLLM-CL: Continual Learning for Multimodal Large Language Models

Hongbo Zhao, Fei Zhu, Haiyang Guo, Meng Wang, Rundong Wang, Gaofeng Meng, Zhaoxiang Zhang

PDF

2 Models 2 Datasets 3 Reviews

TL;DR

MLLM-CL introduces a new benchmark and methods for continual learning in multimodal large language models, enabling them to adapt to evolving domains and abilities with minimal forgetting, thus advancing real-world applicability.

Contribution

The paper presents a novel benchmark for domain and ability continual learning in MLLMs and proposes a parameter isolation and routing method to mitigate catastrophic interference.

Findings

01

Our method significantly reduces forgetting in continual learning scenarios.

02

MLLM-CL outperforms existing approaches on the new benchmark.

03

The approach effectively integrates new knowledge and skills with minimal performance loss.

Abstract

Recent Multimodal Large Language Models (MLLMs) excel in vision-language understanding but face challenges in adapting to dynamic real-world scenarios that require continuous integration of new knowledge and skills. While continual learning (CL) offers a potential solution, existing benchmarks and methods suffer from critical limitations. In this paper, we introduce MLLM-CL, a novel benchmark encompassing domain and ability continual learning, where the former focuses on independently and identically distributed (IID) evaluation across evolving mainstream domains, whereas the latter evaluates on non-IID scenarios with new model abilities. Methodologically, we propose preventing catastrophic interference through parameter isolation and an MLLM-based routing mechanism. Extensive experiments demonstrate that our approach can integrate domain-specific knowledge and functional abilities with…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. **Clear and Well-Motivated Problem Formulation:** The paper clearly identifies the dual challenges of stability and plasticity in MLLM continual learning. 2. **Simple yet Effective Core Idea:** The proposal to train a _fresh_ LoRA from scratch for each task is conceptually simple but effective. This design choice directly tackles the issue of weight interference caused by reusing previous adapters, leading to better new-task performance. 3. **Innovative Use of MLLM as a Router:**

Weaknesses

1. **Linear Parameter Growth and Scalability Concerns:** The most significant limitation is the linear increase in the number of stored LoRA modules with the number of tasks. While LoRA is parameter-efficient, storing hundreds or thousands of adapters could become impractical in long-term or open-ended continual learning scenarios. The paper does not discuss potential strategies to mitigate this. 2. **Limited Discussion on Task Similarity and Negative Transfer:** The paper assumes tasks a

Reviewer 02Rating 4Confidence 4

Strengths

1. It is good to extend the continual learning task in traditional deep learning to MLLM. 2. The contributed dataset could be helpful to the community. 3. The proposed method is simple, and works as shown in experiments.

Weaknesses

1. There exist many ways to make MLLM adapt to new tasks or domains, e.g, in context learning, or retrieval augmented generation. Given a base MLLM model with strong generalization capability, a training free strategy could be more valuable. 2. The classification of DCL and ACL should be further justified. Some tasks in DCL could also be regarded as ACL, e.g., identifying is acid present in medical images. A fuzzy classification would degrade the importance of the dataset. 3. Another concern is

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper is well-written and can be followed easily. 2. The idea of using a two stage inference using the router and ability-specific modules is interesting and from experiments seems effective. 3. Experiments include studying hyperparameters which provide helpful insight about the proposed method. 4.

Weaknesses

1. The MLLM-CL benchmark primarily consists of existing benchmarks that are combined. As a result, the contribution in terms of introducing a new benchmark is weak. 2. Comparisons are limited and include a handful of recent methods. However, there are other CL methods for VLMs with public codebases that can be included to demonstrate that the proposed method is competitive. 3. The code and the benchmark are not provided which makes judgment about reproducibility challenging. The authors have

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.