Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning

Linlan Huang; Xusheng Cao; Haori Lu; Yifan Meng; Fei Yang; Xialei Liu

arXiv:2507.09118·cs.CV·July 15, 2025

Mind the Gap: Preserving and Compensating for the Modality Gap in CLIP-Based Continual Learning

Linlan Huang, Xusheng Cao, Haori Lu, Yifan Meng, Fei Yang, Xialei Liu

PDF

TL;DR

This paper introduces MG-CLIP, a method that leverages the modality gap in CLIP models to improve continual learning by preserving pre-trained knowledge and compensating for new data, outperforming existing methods.

Contribution

It presents a novel modality-gap-based approach for CLIP-based continual learning, addressing the often-overlooked modality gap to reduce forgetting and enhance adaptation.

Findings

01

MG-CLIP outperforms existing methods on multiple benchmarks.

02

The modality gap effectively indicates knowledge preservation.

03

The approach does not require additional replay data.

Abstract

Continual learning aims to enable models to learn sequentially from continuously incoming data while retaining performance on previously learned tasks. With the Contrastive Language-Image Pre-trained model (CLIP) exhibiting strong capabilities across various downstream tasks, there has been growing interest in leveraging CLIP for continual learning in such scenarios. Most existing works overlook the inherent modality gap in CLIP, a key factor in its generalization and adaptability. In this paper, we analyze the variations in the modality gap during the fine-tuning of vision-language pre-trained models. Our observations reveal that the modality gap effectively reflects the extent to which pre-trained knowledge is preserved. Based on these insights, we propose a simple yet effective method, MG-CLIP, that improves CLIP's performance in class-incremental learning. Our approach leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.