Detached and Interactive Multimodal Learning
Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junhong Liu, and Song Guo

TL;DR
This paper introduces DI-MML, a detached multimodal learning framework that avoids modality competition by separately training modality encoders and encouraging cross-modal interaction, leading to improved performance across various datasets.
Contribution
The paper proposes a novel detached multimodal learning framework with isolated modality training and a shared classifier, enhancing complementary information learning without modality competition.
Findings
Outperforms existing methods on multiple datasets.
Effectively leverages complementary information at the instance level.
Demonstrates robustness across diverse multimodal tasks.
Abstract
Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML methods generally use the joint learning framework with a uniform learning objective that can lead to the modality competition issue, where feedback predominantly comes from certain modalities, limiting the full potential of others. In response to this challenge, this paper introduces DI-MML, a novel detached MML framework designed to learn complementary information across modalities under the premise of avoiding modality competition. Specifically, DI-MML addresses competition by separately training each modality encoder with isolated learning objectives. It further encourages cross-modal interaction via a shared classifier that defines a common feature space and employing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEFL/ESL Teaching and Learning
