Active Prompt Learning in Vision Language Models
Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee

TL;DR
This paper introduces PCB, a novel active learning framework for Vision Language Models that addresses class imbalance issues and leverages VLM knowledge to improve task adaptation, outperforming traditional methods.
Contribution
The paper proposes a new active learning method, PCB, tailored for VLMs, which effectively balances classes and utilizes model knowledge to enhance performance.
Findings
PCB outperforms conventional active learning methods.
Applying standard active learning may degrade VLM performance.
Knowledge-guided sampling improves label efficiency.
Abstract
Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active learning, a method of achieving a high performance by obtaining labels for a small number of samples from experts, has been studied. Active learning primarily focuses on selecting unlabeled samples for labeling and leveraging them to train models. In this study, we pose the question, "how can the pre-trained VLMs be adapted under the active learning framework?" In response to this inquiry, we observe that (1) simply applying a conventional active learning framework to pre-trained VLMs even may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
MethodsPart-based Convolutional Baseline
