MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts
Haofei Yu, Zhengyang Qi, Lawrence Jang, Ruslan Salakhutdinov,, Louis-Philippe Morency, Paul Pu Liang

TL;DR
This paper introduces MMoE, a novel approach that trains specialized experts for different multimodal interactions, significantly improving sarcasm and humor detection by capturing complex real-world communication nuances.
Contribution
The paper proposes Multimodal Mixtures of Experts (MMoE), a new method that enhances multimodal models by training separate experts for diverse interaction types, achieving state-of-the-art results.
Findings
State-of-the-art performance on sarcasm detection (MUStARD)
State-of-the-art performance on humor detection (URFUNNY)
Applicable to various multimodal models for improved interaction understanding
Abstract
Advances in multimodal models have greatly improved how interactions relevant to various tasks are modeled. Today's multimodal models mainly focus on the correspondence between images and text, using this for tasks like image-text matching. However, this covers only a subset of real-world interactions. Novel interactions, such as sarcasm expressed through opposing spoken words and gestures or humor expressed through utterances and tone of voice, remain challenging. In this paper, we introduce an approach to enhance multimodal models, which we call Multimodal Mixtures of Experts (MMoE). The key idea in MMoE is to train separate expert models for each type of multimodal interaction, such as redundancy present in both modalities, uniqueness in one modality, or synergy that emerges when both modalities are fused. On a sarcasm detection task (MUStARD) and a humor detection task (URFUNNY), we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Language, Metaphor, and Cognition
MethodsFocus
