Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

He Li; Yuhui Zhang; Xiaohan Wang; Kaifeng Lyu; Serena Yeung-Levy

arXiv:2603.14493·cs.CV·March 17, 2026

Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

He Li, Yuhui Zhang, Xiaohan Wang, Kaifeng Lyu, Serena Yeung-Levy

PDF

Open Access

TL;DR

This paper shows that simple fine-tuning adjustments can effectively prevent catastrophic forgetting in multimodal large language models, offering practical strategies for model adaptation and continual learning.

Contribution

It introduces straightforward fine-tuning techniques and a data-hybrid training strategy that mitigate forgetting and enhance continual learning in MLLMs.

Findings

01

Regularization prevents forgetting of out-of-distribution images.

02

Data-hybrid training addresses task-specific overfitting.

03

Appropriate fine-tuning improves continual learning performance.

Abstract

The paper demonstrate that simple adjustments of the fine-tuning recipes of multimodal large language models (MLLM) are sufficient to mitigate catastrophic forgetting. On visual question answering, we design a 2x2 experimental framework to assess model performance across in-distribution and out-of-distribution image and text inputs. Our results show that appropriate regularization, such as constraining the number of trainable parameters or adopting a low learning rate, effectively prevents forgetting when dealing with out-of-distribution images. However, we uncover a distinct form of forgetting in settings with in-distribution images and out-of-distribution text. We attribute this forgetting as task-specific overfitting and address this issue by introducing a data-hybrid training strategy that combines datasets and tasks. Finally, we demonstrate that this approach naturally extends to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis