Diagnosing and Re-learning for Balanced Multimodal Learning

Yake Wei; Siwei Li; Ruoxuan Feng; Di Hu

arXiv:2407.09705·cs.CV·July 16, 2024·2 cites

Diagnosing and Re-learning for Balanced Multimodal Learning

Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Diagnosing mp; Re-learning approach that estimates each modality's learning state and adaptively re-initializes encoders to balance and enhance multimodal learning, addressing modality capacity limitations.

Contribution

It proposes a novel method that diagnoses modality learning states and re-initializes encoders to improve balance and performance in multimodal learning.

Findings

01

Outperforms existing methods across multiple modalities and frameworks.

02

Effectively balances learning by re-initializing encoders based on modality separability.

03

Demonstrates superior results on diverse multimodal datasets.

Abstract

To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``worse-learnt'' ones, which could force the model to memorize more noise, counterproductively affecting the multimodal model ability. Moreover, the current modality modulation methods narrowly concentrate on selected worse-learnt modalities, even suppressing the training of others. Hence, it is essential to consider the intrinsic limitation of modality capacity and take all modalities into account during balancing. To this end, we propose the Diagnosing \& Re-learning method. The learning state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gewu-lab/diagnosing_relearning_eccv2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Teaching and Learning Methods