Continual Learning in Vision-Language Models via Aligned Model Merging

Ghada Sokar; Gintare Karolina Dziugaite; Anurag Arnab; Ahmet Iscen; Pablo Samuel Castro; Cordelia Schmid

arXiv:2506.03189·cs.CV·June 5, 2025

Continual Learning in Vision-Language Models via Aligned Model Merging

Ghada Sokar, Gintare Karolina Dziugaite, Anurag Arnab, Ahmet Iscen, Pablo Samuel Castro, Cordelia Schmid

PDF

Open Access

TL;DR

This paper introduces a novel continual learning method for vision-language models that merges task-specific parameters to better balance stability and plasticity, reducing forgetting and enhancing robustness.

Contribution

It proposes a model merging approach with aligned weights to improve continual learning in vision-language models, addressing limitations of sequential fine-tuning.

Findings

01

Reduces catastrophic forgetting in vision-language models

02

Enhances robustness across different task sequences

03

Improves generalization performance

Abstract

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors plasticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications