Detached and Interactive Multimodal Learning

Yunfeng Fan; Wenchao Xu; Haozhao Wang; Junhong Liu; and Song Guo

arXiv:2407.19514·cs.CV·July 30, 2024·1 cites

Detached and Interactive Multimodal Learning

Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junhong Liu, and Song Guo

PDF

Open Access 1 Repo

TL;DR

This paper introduces DI-MML, a detached multimodal learning framework that avoids modality competition by separately training modality encoders and encouraging cross-modal interaction, leading to improved performance across various datasets.

Contribution

The paper proposes a novel detached multimodal learning framework with isolated modality training and a shared classifier, enhancing complementary information learning without modality competition.

Findings

01

Outperforms existing methods on multiple datasets.

02

Effectively leverages complementary information at the instance level.

03

Demonstrates robustness across diverse multimodal tasks.

Abstract

Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML methods generally use the joint learning framework with a uniform learning objective that can lead to the modality competition issue, where feedback predominantly comes from certain modalities, limiting the full potential of others. In response to this challenge, this paper introduces DI-MML, a novel detached MML framework designed to learn complementary information across modalities under the premise of avoiding modality competition. Specifically, DI-MML addresses competition by separately training each modality encoder with isolated learning objectives. It further encourages cross-modal interaction via a shared classifier that defines a common feature space and employing a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fanyunfeng-bit/di-mml
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEFL/ESL Teaching and Learning