On-the-fly Modulation for Balanced Multimodal Learning
Yake Wei, Di Hu, Henghui Du, Ji-Rong Wen

TL;DR
This paper introduces on-the-fly modulation strategies to balance the optimization of different modalities in multimodal learning, improving performance by dynamically adjusting the influence of dominant modalities during training.
Contribution
It proposes novel On-the-fly Prediction and Gradient Modulation techniques to address modality imbalance in joint training, enhancing multimodal model performance.
Findings
Significant performance improvements across various multimodal tasks.
Effective balancing of modalities leads to better representation learning.
Strategies are flexible and applicable to complex multimodal models.
Abstract
Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective for all modalities, leads to imbalanced and under-optimized uni-modal representations. Specifically, we point out that there often exists modality with more discriminative information, e.g., vision of playing football and sound of blowing wind. They could dominate the joint training process, resulting in other modalities being significantly under-optimized. To alleviate this problem, we first analyze the under-optimized phenomenon from both the feed-forward and the back-propagation stages during optimization. Then, On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies are proposed to modulate the optimization of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks · Speech and Audio Processing · Indoor and Outdoor Localization Technologies
