CLAP4CLIP: Continual Learning with Probabilistic Finetuning for   Vision-Language Models

Saurav Jha; Dong Gong; Lina Yao

arXiv:2403.19137·cs.CV·November 1, 2024·1 cites

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models

Saurav Jha, Dong Gong, Lina Yao

PDF

Open Access 1 Repo 1 Video

TL;DR

CLAP4CLIP introduces a probabilistic finetuning framework for vision-language models like CLIP, enhancing continual learning by improving uncertainty estimation and reducing forgetting, thus enabling safer and more reliable downstream tasks.

Contribution

The paper proposes CLAP, a novel probabilistic finetuning method for CLIP in continual learning, which improves uncertainty calibration and mitigates forgetting compared to deterministic approaches.

Findings

01

CLAP outperforms deterministic finetuning methods in continual learning tasks.

02

It provides better uncertainty estimation for high-risk applications.

03

The approach enables effective data detection and exemplar selection.

Abstract

Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned. Owing to their powerful generalizability, pre-trained vision-language models such as Contrastive Language-Image Pre-training (CLIP) have lately gained traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tasks often calls for finetuning of the CLIP on the latter. Most existing finetuning methods exhibit deterministic nature. This makes them overlook the many possible interactions across the input modalities and deems them unsafe for high-risk tasks requiring reliable uncertainty estimation. To address these, our work proposes Continual LeArning with Probabilistic finetuning (CLAP) - a probabilistic modeling framework over visual-guided text features per task, thus providing more calibrated CL finetuning. Unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

srvcodes/clap4clip
pytorchOfficial

Videos

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training