Improving Multimodal Learning Balance and Sufficiency through Data Remixing

Xiaoyu Ma; Hao Chen; Yongjian Deng

arXiv:2506.11550·cs.LG·June 17, 2025

Improving Multimodal Learning Balance and Sufficiency through Data Remixing

Xiaoyu Ma, Hao Chen, Yongjian Deng

PDF

Open Access

TL;DR

This paper introduces multimodal Data Remixing, a novel approach to balance and improve the sufficiency of multimodal learning by data filtering and reassembling, leading to significant accuracy gains without extra inference costs.

Contribution

It is the first to simultaneously address both modality imbalance and sufficiency through data remixing techniques in multimodal learning.

Findings

01

Improves accuracy by approximately 6.50% on CREMAD

02

Enhances performance by 3.41% on Kinetic-Sounds

03

Seamlessly integrates with existing methods without extra inference overhead

Abstract

Different modalities hold considerable gaps in optimization trajectories, including speeds and paths, which lead to modality laziness and modality clash when jointly training multimodal models, resulting in insufficient and imbalanced multimodal learning. Existing methods focus on enforcing the weak modality by adding modality-specific optimization objectives, aligning their optimization speeds, or decomposing multimodal learning to enhance unimodal learning. These methods fail to achieve both unimodal sufficiency and multimodal balance. In this paper, we, for the first time, address both concerns by proposing multimodal Data Remixing, including decoupling multimodal data and filtering hard samples for each modality to mitigate modality imbalance; and then batch-level reassembling to align the gradient directions and avoid cross-modal interference, thus enhancing unimodal learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Teaching and Learning Methods · Mobile Learning in Education · Education and Technology Integration