Emotion Recognition From Speech With Recurrent Neural Networks

Vladimir Chernykh; Pavel Prikhodko

arXiv:1701.08071·cs.CL·July 6, 2018·127 cites

Emotion Recognition From Speech With Recurrent Neural Networks

Vladimir Chernykh, Pavel Prikhodko

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper presents a deep recurrent neural network approach for emotion recognition from speech, utilizing a probabilistic CTC loss to handle long utterances with mixed emotional content, achieving high accuracy compared to recent methods and human performance.

Contribution

The paper introduces a novel RNN-based method with CTC loss for emotion recognition that effectively processes long speech segments with mixed emotions.

Findings

01

Outperforms recent methods in emotion recognition accuracy.

02

Achieves results comparable to human performance.

03

Effective handling of long utterances with mixed emotional content.

Abstract

In this paper the task of emotion recognition from speech is considered. Proposed approach uses deep recurrent neural network trained on a sequence of acoustic features calculated over small speech intervals. At the same time special probabilistic-nature CTC loss function allows to consider long utterances containing both emotional and neutral parts. The effectiveness of such an approach is shown in two ways. Firstly, the comparison with recent advances in this field is carried out. Secondly, human performance on the same task is measured. Both criteria show the high quality of the proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vladimir-chernykh/emotion_recognition
noneOfficial

Models

🤗
dmdoy/Emotion_Recognition_From_Speech
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsConnectionist Temporal Classification Loss