CLIP with Generative Latent Replay: a Strong Baseline for Incremental   Learning

Emanuele Frascaroli; Aniello Panariello; Pietro Buzzega; Lorenzo; Bonicelli; Angelo Porrello; Simone Calderara

arXiv:2407.15793·cs.CV·October 29, 2024

CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning

Emanuele Frascaroli, Aniello Panariello, Pietro Buzzega, Lorenzo, Bonicelli, Angelo Porrello, Simone Calderara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a generative replay method using VAEs to enhance incremental learning in CLIP, effectively mitigating forgetting and improving zero-shot performance across diverse domains.

Contribution

It presents a novel generative replay approach with VAEs for CLIP, maintaining zero-shot abilities and adapting to new tasks without catastrophic forgetting.

Findings

01

Improves zero-shot capabilities in incremental learning

02

Bridges gap with joint prompt tuning

03

Effective across multiple domain shifts

Abstract

With the emergence of Transformers and Vision-Language Models (VLMs) such as CLIP, fine-tuning large pre-trained models has recently become a prevalent strategy in Continual Learning. This has led to the development of numerous prompting strategies to adapt transformer-based models without incurring catastrophic forgetting. However, these strategies often compromise the original zero-shot capabilities of the pre-trained CLIP model and struggle to adapt to domains that significantly deviate from the pre-training data. In this work, we propose Continual Generative training for Incremental prompt-Learning, a simple and novel approach to mitigate forgetting while adapting CLIP. Briefly, we employ Variational Autoencoders (VAEs) to learn class-conditioned distributions within the embedding space of the visual encoder. We then exploit these distributions to sample new synthetic visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aimagelab/mammoth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsContrastive Language-Image Pre-training · ALIGN