Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu

TL;DR
This paper introduces a Diagnosing mp; Re-learning approach that estimates each modality's learning state and adaptively re-initializes encoders to balance and enhance multimodal learning, addressing modality capacity limitations.
Contribution
It proposes a novel method that diagnoses modality learning states and re-initializes encoders to improve balance and performance in multimodal learning.
Findings
Outperforms existing methods across multiple modalities and frameworks.
Effectively balances learning by re-initializing encoders based on modality separability.
Demonstrates superior results on diverse multimodal datasets.
Abstract
To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``worse-learnt'' ones, which could force the model to memorize more noise, counterproductively affecting the multimodal model ability. Moreover, the current modality modulation methods narrowly concentrate on selected worse-learnt modalities, even suppressing the training of others. Hence, it is essential to consider the intrinsic limitation of modality capacity and take all modalities into account during balancing. To this end, we propose the Diagnosing \& Re-learning method. The learning state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods
