CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models
Saurav Jha, Dong Gong, Lina Yao

TL;DR
CLAP4CLIP introduces a probabilistic finetuning framework for vision-language models like CLIP, enhancing continual learning by improving uncertainty estimation and reducing forgetting, thus enabling safer and more reliable downstream tasks.
Contribution
The paper proposes CLAP, a novel probabilistic finetuning method for CLIP in continual learning, which improves uncertainty calibration and mitigates forgetting compared to deterministic approaches.
Findings
CLAP outperforms deterministic finetuning methods in continual learning tasks.
It provides better uncertainty estimation for high-risk applications.
The approach enables effective data detection and exemplar selection.
Abstract
Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned. Owing to their powerful generalizability, pre-trained vision-language models such as Contrastive Language-Image Pre-training (CLIP) have lately gained traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tasks often calls for finetuning of the CLIP on the latter. Most existing finetuning methods exhibit deterministic nature. This makes them overlook the many possible interactions across the input modalities and deems them unsafe for high-risk tasks requiring reliable uncertainty estimation. To address these, our work proposes Continual LeArning with Probabilistic finetuning (CLAP) - a probabilistic modeling framework over visual-guided text features per task, thus providing more calibrated CL finetuning. Unlike…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training
