Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement
Zhe Yang, Wenrui Li, Hongtao Chen, Penghong Wang, Ruiqin Xiong, Xiaopeng Fan

TL;DR
This paper introduces RedReg, an adaptive method for balancing multimodal learning by regulating redundancy and modality dominance, leading to improved performance and representation quality.
Contribution
RedReg employs a redundancy monitor and co-information gating to dynamically balance modalities, addressing long-term dominance and redundancy issues in multimodal training.
Findings
RedReg outperforms existing methods in most scenarios.
Ablation studies confirm the effectiveness of each component.
The method adapts to modality reliance, preserving modality-specific information.
Abstract
Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multimodal training, due to modality bias, the advantaged modality often dominates backpropagation, leading to imbalanced optimization. Existing methods still face two problems: First, the long-term dominance of the dominant modality weakens representation-output coupling in the late stages of training, resulting in the accumulation of redundant information. Second, previous methods often directly and uniformly adjust the gradients of the advantaged modality, ignoring the semantics and directionality between modalities. To address these limitations, we propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement (RedReg), which is inspired by information bottleneck principle. Specifically, we construct a redundancy phase monitor that uses a joint criterion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
