TL;DR
This paper introduces the multimodal information bottleneck (MIB), a novel framework that learns minimal, sufficient, and less redundant multimodal and unimodal representations to improve cross-modal learning tasks.
Contribution
The paper proposes the MIB framework, extending the information bottleneck principle to regularize both unimodal and multimodal representations, enhancing effectiveness and reducing noise in multimodal learning.
Findings
Achieves state-of-the-art results on multimodal sentiment analysis.
Effectively filters out noisy unimodal information.
Demonstrates flexibility with different fusion methods.
Abstract
Learning effective joint embedding for cross-modal data has always been a focus in the field of multimodal machine learning. We argue that during multimodal fusion, the generated multimodal embedding may be redundant, and the discriminative unimodal information may be ignored, which often interferes with accurate prediction and leads to a higher risk of overfitting. Moreover, unimodal representations also contain noisy information that negatively influences the learning of cross-modal dynamics. To this end, we introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation that is free of redundancy and to filter out noisy information in unimodal representations. Specifically, inheriting from the general information bottleneck (IB), MIB aims to learn the minimal sufficient representation for a given task by maximizing the mutual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
