Active Learning via Vision-Language Model Adaptation with Open Data

Tong Wang; Jiaqi Wang; Shu Kong

arXiv:2506.01724·cs.CV·June 3, 2025

Active Learning via Vision-Language Model Adaptation with Open Data

Tong Wang, Jiaqi Wang, Shu Kong

PDF

Open Access

TL;DR

This paper introduces ALOR, a novel active learning approach that leverages open data and vision-language models, demonstrating significant improvements through contrastive finetuning and class imbalance strategies.

Contribution

It proposes using open-source data to enhance active learning with VLMs, compares adaptation methods, and introduces TFS for better class balance in labeling.

Findings

01

Contrastive tuning outperforms other adaptation methods.

02

Incorporating retrieved open data improves active learning.

03

TFS effectively addresses class imbalance in data sampling.

Abstract

Pretrained on web-scale open data, VLMs offer powerful capabilities for solving downstream tasks after being adapted to task-specific labeled data. Yet, data labeling can be expensive and may demand domain expertise. Active Learning (AL) aims to reduce this expense by strategically selecting the most informative data for labeling and model training. Recent AL methods have explored VLMs but have not leveraged publicly available open data, such as VLM's pretraining data. In this work, we leverage such data by retrieving task-relevant examples to augment the task-specific examples. As expected, incorporating them significantly improves AL. Given that our method exploits open-source VLM and open data, we refer to it as Active Learning with Open Resources (ALOR). Additionally, most VLM-based AL methods use prompt tuning (PT) for model adaptation, likely due to its ability to directly utilize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications