Improving Multimodal Learning via Imbalanced Learning
Shicai Wei, Chunbo Luo, Yang Luo

TL;DR
This paper introduces ARL, a novel strategy for multimodal learning that uses imbalanced optimization based on variance analysis to improve performance without adding extra parameters.
Contribution
It proposes the ARL method that adjusts modality dependence ratios through variance-based re-weighting, challenging the traditional balanced learning approach in multimodal systems.
Findings
ARL improves performance across various datasets.
ARL is parameter-efficient and model-agnostic.
Experimental results validate ARL's effectiveness and versatility.
Abstract
Multimodal learning often encounters the under-optimized problem and may perform worse than unimodal learning. Existing approaches attribute this issue to imbalanced learning across modalities and tend to address it through gradient balancing. However, this paper argues that balanced learning is not the optimal setting for multimodal learning. With bias-variance analysis, we prove that imbalanced dependency on each modality obeying the inverse ratio of their variances contributes to optimal performance. To this end, we propose the Asymmetric Representation Learning(ARL) strategy to assist multimodal learning via imbalanced optimization. ARL introduces auxiliary regularizers for each modality encoder to calculate their prediction variance. ARL then calculates coefficients via the unimodal variance to re-weight the optimization of each modality, forcing the modality dependence ratio to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
