M3ER: Multiplicative Multimodal Emotion Recognition Using Facial,   Textual, and Speech Cues

Trisha Mittal; Uttaran Bhattacharya; Rohan Chandra; Aniket Bera,; Dinesh Manocha

arXiv:1911.05659·eess.SP·November 25, 2019

M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues

Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera,, Dinesh Manocha

PDF

TL;DR

M3ER is a novel multimodal emotion recognition method that uses multiplicative fusion and a check step with Canonical Correlational Analysis to improve robustness and accuracy across face, text, and speech data.

Contribution

It introduces a data-driven multiplicative fusion technique and a modality effectiveness check, enhancing robustness to sensor noise in multimodal emotion recognition.

Findings

01

Achieved 82.7% accuracy on IEMOCAP

02

Achieved 89.0% accuracy on CMU-MOSEI

03

Improved performance by approximately 5% over prior methods

Abstract

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a per-sample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82.7% on IEMOCAP and 89.0% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.