Self-MI: Efficient Multimodal Fusion via Self-Supervised Multi-Task Learning with Auxiliary Mutual Information Maximization
Cam-Van Thi Nguyen, Ngoc-Hoa Thi Nguyen, Duc-Trong Le, Quang-Thuy Ha

TL;DR
Self-MI introduces a self-supervised multimodal fusion method that maximizes mutual information between unimodal inputs and fused representations, improving performance on benchmark datasets.
Contribution
It proposes a novel self-supervised learning framework with auxiliary MI maximization and a label generation module for better multimodal fusion.
Findings
Enhanced fusion performance on CMU-MOSI, CMU-MOSEI, and SIMS datasets.
Effective mutual information maximization improves modality alignment.
Outperforms existing multimodal fusion methods.
Abstract
Multimodal representation learning poses significant challenges in capturing informative and distinct features from multiple modalities. Existing methods often struggle to exploit the unique characteristics of each modality due to unified multimodal annotations. In this study, we propose Self-MI in the self-supervised learning fashion, which also leverage Contrastive Predictive Coding (CPC) as an auxiliary technique to maximize the Mutual Information (MI) between unimodal input pairs and the multimodal fusion result with unimodal inputs. Moreover, we design a label generation module, for short, that enables us to create meaningful and informative labels for each modality in a self-supervised manner. By maximizing the Mutual Information, we encourage better alignment between the multimodal fusion and the individual modalities, facilitating improved multimodal fusion. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis
MethodsInfoNCE · Contrastive Predictive Coding
