BALM: A Model-Agnostic Framework for Balanced Multimodal Learning under Imbalanced Missing Rates
Phuong-Anh Nguyen, Tien Anh Pham, Duc-Trong Le, Cam-Van Thi Nguyen

TL;DR
BALM is a versatile framework that improves multimodal learning robustness and performance in scenarios with imbalanced missing data by calibrating features and rebalancing gradients, adaptable to various models.
Contribution
This paper introduces BALM, a novel, model-agnostic framework with feature calibration and gradient rebalancing modules for balanced learning under imbalanced missing rates.
Findings
Enhances robustness of multimodal models with missing data.
Improves performance across multiple benchmarks.
Seamlessly integrates into existing models without architecture changes.
Abstract
Learning from multiple modalities often suffers from imbalance, where information-rich modalities dominate optimization while weaker or partially missing modalities contribute less. This imbalance becomes severe in realistic settings with imbalanced missing rates (IMR), where each modality is absent with different probabilities, distorting representation learning and gradient dynamics. We revisit this issue from a training-process perspective and propose BALM, a model-agnostic plug-in framework to achieve balanced multimodal learning under IMR. The framework comprises two complementary modules: the Feature Calibration Module (FCM), which recalibrates unimodal features using global context to establish a shared representation basis across heterogeneous missing patterns; the Gradient Rebalancing Module (GRM), which balances learning dynamics across modalities by modulating gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Imbalanced Data Classification Techniques · Face and Expression Recognition
