Active Learning for Vision-Language Models
Bardia Safaei, Vishal M. Patel

TL;DR
This paper introduces a novel active learning framework that improves the zero-shot classification performance of vision-language models like CLIP by selecting the most informative samples for annotation, thereby bridging the performance gap with supervised models.
Contribution
The paper proposes a new active learning method that calibrates entropy and combines uncertainties to select informative samples, enhancing VLMs' performance with fewer labeled data.
Findings
Outperforms existing active learning methods on multiple datasets
Significantly improves zero-shot classification accuracy of VLMs
Reduces the amount of labeled data needed for high performance
Abstract
Pre-trained vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot performance on a wide range of downstream computer vision tasks. However, there still exists a considerable performance gap between these models and a supervised deep model trained on a downstream dataset. To bridge this gap, we propose a novel active learning (AL) framework that enhances the zero-shot classification performance of VLMs by selecting only a few informative samples from the unlabeled data for annotation during training. To achieve this, our approach first calibrates the predicted entropy of VLMs and then utilizes a combination of self-uncertainty and neighbor-aware uncertainty to calculate a reliable uncertainty measure for active sample selection. Our extensive experiments show that the proposed approach outperforms existing AL approaches on several image classification datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Machine Learning and Algorithms
MethodsContrastive Language-Image Pre-training
