Label What Matters: Modality-Balanced and Difficulty-Aware Multimodal Active Learning

Yuqiao Zeng; Xu Wang; Tengfei Liang; Yiqing Hao; Yi Jin; Hui Yu

arXiv:2603.25107·cs.CV·March 27, 2026

Label What Matters: Modality-Balanced and Difficulty-Aware Multimodal Active Learning

Yuqiao Zeng, Xu Wang, Tengfei Liang, Yiqing Hao, Yi Jin, Hui Yu

PDF

Open Access

TL;DR

This paper introduces RL-MBA, a reinforcement learning framework for multimodal active learning that dynamically balances modalities and accounts for sample difficulty, leading to improved accuracy and fairness.

Contribution

It proposes a novel adaptive framework that models sample selection as a Markov Decision Process, incorporating modality contribution balancing and difficulty-aware policy adjustment.

Findings

01

RL-MBA outperforms baseline methods on multiple datasets.

02

It improves classification accuracy under limited labeling budgets.

03

It enhances modality fairness in multimodal learning.

Abstract

Multimodal learning integrates complementary information from different modalities such as image, text, and audio to improve model performance, but its success relies on large-scale labeled data, which is costly to obtain. Active learning (AL) mitigates this challenge by selectively annotating informative samples. In multimodal settings, many approaches implicitly assume that modality importance is stable across rounds and keep selection rules fixed at the fusion stage, which leaves them insensitive to the dynamic nature of multimodal learning, where the relative value of modalities and the difficulty of instances shift as training proceeds. To address this issue, we propose RL-MBA, a reinforcement-learning framework for modality-balanced, difficulty-aware multimodal active learning. RL-MBA models sample selection as a Markov Decision Process, where the policy adapts to modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications