Emotion Recognition From Speech With Recurrent Neural Networks
Vladimir Chernykh, Pavel Prikhodko

TL;DR
This paper presents a deep recurrent neural network approach for emotion recognition from speech, utilizing a probabilistic CTC loss to handle long utterances with mixed emotional content, achieving high accuracy compared to recent methods and human performance.
Contribution
The paper introduces a novel RNN-based method with CTC loss for emotion recognition that effectively processes long speech segments with mixed emotions.
Findings
Outperforms recent methods in emotion recognition accuracy.
Achieves results comparable to human performance.
Effective handling of long utterances with mixed emotional content.
Abstract
In this paper the task of emotion recognition from speech is considered. Proposed approach uses deep recurrent neural network trained on a sequence of acoustic features calculated over small speech intervals. At the same time special probabilistic-nature CTC loss function allows to consider long utterances containing both emotional and neutral parts. The effectiveness of such an approach is shown in two ways. Firstly, the comparison with recent advances in this field is carried out. Secondly, human performance on the same task is measured. Both criteria show the high quality of the proposed method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsConnectionist Temporal Classification Loss
