Multimodal Speech Emotion Recognition Using Audio and Text

Seunghyun Yoon; Seokhyun Byun; Kyomin Jung

arXiv:1810.04635·cs.CL·October 11, 2018

Multimodal Speech Emotion Recognition Using Audio and Text

Seunghyun Yoon, Seokhyun Byun, Kyomin Jung

PDF

4 Repos

TL;DR

This paper introduces a dual recurrent neural network model that combines audio and text data for improved speech emotion recognition, outperforming previous methods on the IEMOCAP dataset.

Contribution

The novel deep dual recurrent encoder effectively integrates audio and text information for emotion recognition, providing a more comprehensive analysis than prior audio-only models.

Findings

01

Outperforms state-of-the-art methods with 68.8% to 71.8% accuracy on IEMOCAP

02

Utilizes dual RNNs to encode audio and text sequences jointly

03

Achieves better emotion classification by combining signal and language data

Abstract

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.