Active Prompt Learning in Vision Language Models

Jihwan Bang; Sumyeong Ahn; Jae-Gil Lee

arXiv:2311.11178·cs.CV·March 22, 2024·1 cites

Active Prompt Learning in Vision Language Models

Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces PCB, a novel active learning framework for Vision Language Models that addresses class imbalance issues and leverages VLM knowledge to improve task adaptation, outperforming traditional methods.

Contribution

The paper proposes a new active learning method, PCB, tailored for VLMs, which effectively balances classes and utilizes model knowledge to enhance performance.

Findings

01

PCB outperforms conventional active learning methods.

02

Applying standard active learning may degrade VLM performance.

03

Knowledge-guided sampling improves label efficiency.

Abstract

Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active learning, a method of achieving a high performance by obtaining labels for a small number of samples from experts, has been studied. Active learning primarily focuses on selecting unlabeled samples for labeling and leveraging them to train models. In this study, we pose the question, "how can the pre-trained VLMs be adapted under the active learning framework?" In response to this inquiry, we observe that (1) simply applying a conventional active learning framework to pre-trained VLMs even may…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaist-dmlab/pcb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsPart-based Convolutional Baseline