Learning Alignment for Multimodal Emotion Recognition from Speech

Haiyang Xu; Hui Zhang; Kun Han; Yun Wang; Yiping Peng; Xiangang Li

arXiv:1909.05645·cs.CL·April 6, 2020·6 cites

Learning Alignment for Multimodal Emotion Recognition from Speech

Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, Xiangang Li

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces an attention-based alignment method for multimodal emotion recognition from speech and text, improving feature integration and achieving state-of-the-art results on the IEMOCAP dataset.

Contribution

It proposes a novel attention mechanism to align speech frames and text words, enhancing multimodal emotion recognition accuracy.

Findings

01

Achieves state-of-the-art performance on IEMOCAP dataset

02

Demonstrates the effectiveness of learned alignment in multimodal emotion recognition

03

Outperforms previous decision-level fusion methods

Abstract

Speech emotion recognition is a challenging problem because human convey emotions in subtle and complex ways. For emotion recognition on human speech, one can either extract emotion related features from audio signals or employ speech recognition techniques to generate text from speech and then apply natural language processing to analyze the sentiment. Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality. One can build models for two input sources separately and combine them in a decision level, but this method ignores the interaction between speech and text in the temporal domain. In this paper, we propose to use an attention mechanism to learn the alignment between speech frames and text words, aiming to produce more accurate multimodal feature representations. The aligned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhiqiWang12-hash/text_audio_classification
tf

Models

🤗
dmdoy/Emotion_Recognition_From_Speech
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis