Improving Multimodal fusion via Mutual Dependency Maximisation

Pierre Colombo; Emile Chapuis; Matthieu Labeau; Chloe Clavel

arXiv:2109.00922·cs.LG·September 10, 2021

Improving Multimodal fusion via Mutual Dependency Maximisation

Pierre Colombo, Emile Chapuis, Matthieu Labeau, Chloe Clavel

PDF

Open Access

TL;DR

This paper introduces new dependency-based loss functions for multimodal fusion in sentiment analysis, significantly improving accuracy and robustness across multiple models and datasets.

Contribution

It proposes novel dependency maximisation penalties for multimodal fusion, leading to state-of-the-art results and more interpretable high-dimensional representations.

Findings

01

Achieved up to 4.3% accuracy improvement

02

Set new SOTA on CMU-MOSI and CMU-MOSEI datasets

03

Produced more robust and interpretable multimodal representations

Abstract

Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic. Acknowledging humans communicate through a variety of channels (i.e visual, acoustic, linguistic), multimodal systems aim at integrating different unimodal representations into a synthetic one. So far, a consequent effort has been made on developing complex architectures allowing the fusion of these modalities. However, such systems are mainly trained by minimising simple losses such as $L_{1}$ or cross-entropy. In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities. We demonstrate that our new penalties lead to a consistent improvement (up to $4.3$ on accuracy) across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: \texttt{CMU-MOSI} and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Sentiment Analysis and Opinion Mining · Speech and Audio Processing