Conformal Cross-Modal Active Learning

Huy Hoang Nguyen; C\'edric Jung; Shirin Salehi; Tobias Gl\"uck; Anke Schmeink; Andreas Kugi

arXiv:2603.23159·cs.CV·March 27, 2026

Conformal Cross-Modal Active Learning

Huy Hoang Nguyen, C\'edric Jung, Shirin Salehi, Tobias Gl\"uck, Anke Schmeink, Andreas Kugi

PDF

Open Access

TL;DR

This paper presents CCMA, a novel active learning framework that leverages vision-language models to improve data efficiency by using conformal calibration and multimodal uncertainty estimates for sample selection.

Contribution

Introducing Conformal Cross-Modal Acquisition (CCMA), a new active learning method that combines multimodal uncertainty estimation with diversity strategies using pretrained vision-language models.

Findings

01

CCMA outperforms existing active learning methods on multiple benchmarks.

02

The approach effectively reduces annotation costs while maintaining high accuracy.

03

Multimodal conformal scoring enhances sample selection quality.

Abstract

Foundation models for vision have transformed visual recognition with powerful pretrained representations and strong zero-shot capabilities, yet their potential for data-efficient learning remains largely untapped. Active Learning (AL) aims to minimize annotation costs by strategically selecting the most informative samples for labeling, but existing methods largely overlook the rich multimodal knowledge embedded in modern vision-language models (VLMs). We introduce Conformal Cross-Modal Acquisition (CCMA), a novel AL framework that bridges vision and language modalities through a teacher-student architecture. CCMA employs a pretrained VLM as a teacher to provide semantically grounded uncertainty estimates, conformally calibrated to guide sample selection for a vision-only student model. By integrating multimodal conformal scoring with diversity-aware selection strategies, CCMA achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning