Beyond Forced Modality Balance: Intrinsic Information Budgets for Multimodal Learning
Zechang Xiong, Da Li, Kexin Tang, Pengyuan Li, Wenkang Kong, Yulan Hu

TL;DR
This paper introduces IIBalance, a novel multimodal learning framework that aligns modality contributions with their intrinsic information capacities, leading to improved performance over existing balancing methods.
Contribution
The work proposes a task-grounded estimator of intrinsic information budgets and a prototype-based alignment mechanism to better utilize modality capacities.
Findings
Outperforms state-of-the-art balancing methods on benchmarks.
Effectively utilizes complementary modality cues.
Achieves more calibrated fusion weights during inference.
Abstract
Multimodal models often converge to a dominant-modality solution, in which a stronger, faster-converging modality overshadows weaker ones. This modality imbalance causes suboptimal performance. Existing methods attempt to balance different modalities by reweighting gradients or losses. However, they overlook the fact that each modality has finite information capacity. In this work, we propose IIBalance, a multimodal learning framework that aligns the modality contributions with Intrinsic Information Budgets (IIB). We propose a task-grounded estimator of each modality's IIB, transforming its capacity into a global prior over modality contributions. Anchored by the highest-budget modality, we design a prototype-based relative alignment mechanism that corrects semantic drift only when weaker modalities deviate from their budgeted potential, rather than forcing imitation. During inference,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
