Leveraging Label Information for Multimodal Emotion Recognition
Peiying Wang, Sunlu Zeng, Junqing Chen, Lu Fan, Meng Chen, Youzheng, Wu, Xiaodong He

TL;DR
This paper introduces a novel multimodal emotion recognition method that leverages label information to enhance text and speech representations, leading to improved accuracy and state-of-the-art results on the IEMOCAP dataset.
Contribution
It proposes a label-guided approach for multimodal emotion recognition that effectively integrates label information into text and speech representations.
Findings
Outperforms existing baselines on IEMOCAP dataset
Achieves new state-of-the-art performance
Demonstrates the effectiveness of label-guided fusion
Abstract
Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining
