Conformal Cross-Modal Active Learning
Huy Hoang Nguyen, C\'edric Jung, Shirin Salehi, Tobias Gl\"uck, Anke Schmeink, Andreas Kugi

TL;DR
This paper presents CCMA, a novel active learning framework that leverages vision-language models to improve data efficiency by using conformal calibration and multimodal uncertainty estimates for sample selection.
Contribution
Introducing Conformal Cross-Modal Acquisition (CCMA), a new active learning method that combines multimodal uncertainty estimation with diversity strategies using pretrained vision-language models.
Findings
CCMA outperforms existing active learning methods on multiple benchmarks.
The approach effectively reduces annotation costs while maintaining high accuracy.
Multimodal conformal scoring enhances sample selection quality.
Abstract
Foundation models for vision have transformed visual recognition with powerful pretrained representations and strong zero-shot capabilities, yet their potential for data-efficient learning remains largely untapped. Active Learning (AL) aims to minimize annotation costs by strategically selecting the most informative samples for labeling, but existing methods largely overlook the rich multimodal knowledge embedded in modern vision-language models (VLMs). We introduce Conformal Cross-Modal Acquisition (CCMA), a novel AL framework that bridges vision and language modalities through a teacher-student architecture. CCMA employs a pretrained VLM as a teacher to provide semantically grounded uncertainty estimates, conformally calibrated to guide sample selection for a vision-only student model. By integrating multimodal conformal scoring with diversity-aware selection strategies, CCMA achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
