CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning
Emanuele Frascaroli, Aniello Panariello, Pietro Buzzega, Lorenzo, Bonicelli, Angelo Porrello, Simone Calderara

TL;DR
This paper introduces a generative replay method using VAEs to enhance incremental learning in CLIP, effectively mitigating forgetting and improving zero-shot performance across diverse domains.
Contribution
It presents a novel generative replay approach with VAEs for CLIP, maintaining zero-shot abilities and adapting to new tasks without catastrophic forgetting.
Findings
Improves zero-shot capabilities in incremental learning
Bridges gap with joint prompt tuning
Effective across multiple domain shifts
Abstract
With the emergence of Transformers and Vision-Language Models (VLMs) such as CLIP, fine-tuning large pre-trained models has recently become a prevalent strategy in Continual Learning. This has led to the development of numerous prompting strategies to adapt transformer-based models without incurring catastrophic forgetting. However, these strategies often compromise the original zero-shot capabilities of the pre-trained CLIP model and struggle to adapt to domains that significantly deviate from the pre-training data. In this work, we propose Continual Generative training for Incremental prompt-Learning, a simple and novel approach to mitigate forgetting while adapting CLIP. Briefly, we employ Variational Autoencoders (VAEs) to learn class-conditioned distributions within the embedding space of the visual encoder. We then exploit these distributions to sample new synthetic visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsContrastive Language-Image Pre-training · ALIGN
