Improving Multimodal Learning Balance and Sufficiency through Data Remixing
Xiaoyu Ma, Hao Chen, Yongjian Deng

TL;DR
This paper introduces multimodal Data Remixing, a novel approach to balance and improve the sufficiency of multimodal learning by data filtering and reassembling, leading to significant accuracy gains without extra inference costs.
Contribution
It is the first to simultaneously address both modality imbalance and sufficiency through data remixing techniques in multimodal learning.
Findings
Improves accuracy by approximately 6.50% on CREMAD
Enhances performance by 3.41% on Kinetic-Sounds
Seamlessly integrates with existing methods without extra inference overhead
Abstract
Different modalities hold considerable gaps in optimization trajectories, including speeds and paths, which lead to modality laziness and modality clash when jointly training multimodal models, resulting in insufficient and imbalanced multimodal learning. Existing methods focus on enforcing the weak modality by adding modality-specific optimization objectives, aligning their optimization speeds, or decomposing multimodal learning to enhance unimodal learning. These methods fail to achieve both unimodal sufficiency and multimodal balance. In this paper, we, for the first time, address both concerns by proposing multimodal Data Remixing, including decoupling multimodal data and filtering hard samples for each modality to mitigate modality imbalance; and then batch-level reassembling to align the gradient directions and avoid cross-modal interference, thus enhancing unimodal learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Mobile Learning in Education · Education and Technology Integration
