Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models

Songlin Dong; Chenhao Ding; Jiangyang Li; Jizhou Han; Qiang Wang; Yuhang He; Yihong Gong

arXiv:2505.07690·cs.CV·May 13, 2025

Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models

Songlin Dong, Chenhao Ding, Jiangyang Li, Jizhou Han, Qiang Wang, Yuhang He, Yihong Gong

PDF

Open Access

TL;DR

This paper introduces AFA, a novel framework for continual learning in vision-language models that enhances zero-shot recognition and few-shot learning capabilities, outperforming existing methods in multi-domain incremental tasks.

Contribution

The paper proposes a dual-adapter framework, AFA, that addresses forward and backward forgetting in continual learning of VLMs, improving zero-shot and few-shot performance.

Findings

01

AFA significantly outperforms state-of-the-art methods in few-shot MTIL tasks.

02

AFA surpasses CLIP's inherent zero-shot transferability.

03

Extensive experiments validate the effectiveness of the proposed framework.

Abstract

This study aims to address the problem of multi-domain task incremental learning~(MTIL), which requires that vision-language models~(VLMs) continuously acquire new knowledge while maintaining their inherent zero-shot recognition capability. Existing paradigms delegate the testing of unseen-domain samples to the original CLIP, which only prevents the degradation of the model's zero-shot capability but fails to enhance the generalization of the VLM further. To this end, we propose a novel MTIL framework, named AFA, which comprises two core modules: (1) an against forward-forgetting adapter that learns task-invariant information for each dataset in the incremental tasks to enhance the zero-shot recognition ability of VLMs; (2) an against backward-forgetting adapter that strengthens the few-shot learning capability of VLMs while supporting incremental learning. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsAdapter · Contrastive Language-Image Pre-training