Improving Multimodal fusion via Mutual Dependency Maximisation
Pierre Colombo, Emile Chapuis, Matthieu Labeau, Chloe Clavel

TL;DR
This paper introduces new dependency-based loss functions for multimodal fusion in sentiment analysis, significantly improving accuracy and robustness across multiple models and datasets.
Contribution
It proposes novel dependency maximisation penalties for multimodal fusion, leading to state-of-the-art results and more interpretable high-dimensional representations.
Findings
Achieved up to 4.3% accuracy improvement
Set new SOTA on CMU-MOSI and CMU-MOSEI datasets
Produced more robust and interpretable multimodal representations
Abstract
Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic. Acknowledging humans communicate through a variety of channels (i.e visual, acoustic, linguistic), multimodal systems aim at integrating different unimodal representations into a synthetic one. So far, a consequent effort has been made on developing complex architectures allowing the fusion of these modalities. However, such systems are mainly trained by minimising simple losses such as or cross-entropy. In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities. We demonstrate that our new penalties lead to a consistent improvement (up to on accuracy) across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: \texttt{CMU-MOSI} and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Sentiment Analysis and Opinion Mining · Speech and Audio Processing
