Mind the Gap: A Framework for Assessing Pitfalls in Multimodal Active Learning

Dustin Eisenhardt; Yunhee Jeong; Florian Buettner

arXiv:2603.29677·cs.LG·April 1, 2026

Mind the Gap: A Framework for Assessing Pitfalls in Multimodal Active Learning

Dustin Eisenhardt, Yunhee Jeong, Florian Buettner

PDF

TL;DR

This paper introduces a benchmarking framework for multimodal active learning, revealing challenges like modality imbalance and showing current strategies often fail to address these issues effectively.

Contribution

The authors propose a synthetic dataset-based framework to evaluate pitfalls in multimodal active learning and compare strategies, highlighting limitations of existing methods.

Findings

01

Models develop imbalanced modality representations.

02

Existing query strategies do not mitigate modality neglect.

03

Multimodal strategies do not outperform unimodal ones.

Abstract

Multimodal learning enables neural networks to integrate information from heterogeneous sources, but active learning in this setting faces distinct challenges. These include missing modalities, differences in modality difficulty, and varying interaction structures. These are issues absent in the unimodal case. While the behavior of active learning strategies in unimodal settings is well characterized, their behavior under such multimodal conditions remains poorly understood. We introduce a new framework for benchmarking multimodal active learning that isolates these pitfalls using synthetic datasets, allowing systematic evaluation without confounding noise. Using this framework, we compare unimodal and multimodal query strategies and validate our findings on two real-world datasets. Our results show that models consistently develop imbalanced representations, relying primarily on one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.