M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation
Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj, Wasnik, Naoyuki Onoe

TL;DR
This paper introduces M2FNet, a multi-modal fusion network that combines audio, visual, and text data using attention mechanisms and a novel loss function, significantly improving emotion recognition accuracy across multiple datasets.
Contribution
The paper presents a new multi-modal fusion architecture with a specialized feature extractor and adaptive triplet loss, advancing emotion recognition in conversations.
Findings
Outperforms existing methods on MELD and IEMOCAP datasets
Achieves state-of-the-art weighted F1 scores in ERC
Effectively integrates multi-modal data for emotion recognition
Abstract
Emotion Recognition in Conversations (ERC) is crucial in developing sympathetic human-machine interaction. In conversational videos, emotion can be present in multiple modalities, i.e., audio, video, and transcript. However, due to the inherent characteristics of these modalities, multi-modal ERC has always been considered a challenging undertaking. Existing ERC research focuses mainly on using text information in a discussion, ignoring the other two modalities. We anticipate that emotion recognition accuracy can be improved by employing a multi-modal approach. Thus, in this study, we propose a Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality. It employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data. We introduce a new feature extractor to extract latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining
MethodsTriplet Loss
