Active Learning for Vision-Language Models

Bardia Safaei; Vishal M. Patel

arXiv:2410.22187·cs.CV·October 30, 2024

Active Learning for Vision-Language Models

Bardia Safaei, Vishal M. Patel

PDF

Open Access

TL;DR

This paper introduces a novel active learning framework that improves the zero-shot classification performance of vision-language models like CLIP by selecting the most informative samples for annotation, thereby bridging the performance gap with supervised models.

Contribution

The paper proposes a new active learning method that calibrates entropy and combines uncertainties to select informative samples, enhancing VLMs' performance with fewer labeled data.

Findings

01

Outperforms existing active learning methods on multiple datasets

02

Significantly improves zero-shot classification accuracy of VLMs

03

Reduces the amount of labeled data needed for high performance

Abstract

Pre-trained vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot performance on a wide range of downstream computer vision tasks. However, there still exists a considerable performance gap between these models and a supervised deep model trained on a downstream dataset. To bridge this gap, we propose a novel active learning (AL) framework that enhances the zero-shot classification performance of VLMs by selecting only a few informative samples from the unlabeled data for annotation during training. To achieve this, our approach first calibrates the predicted entropy of VLMs and then utilizes a combination of self-uncertainty and neighbor-aware uncertainty to calculate a reliable uncertainty measure for active sample selection. Our extensive experiments show that the proposed approach outperforms existing AL approaches on several image classification datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Machine Learning and Algorithms

MethodsContrastive Language-Image Pre-training