Attention Based Fully Convolutional Network for Speech Emotion   Recognition

Yuanyuan Zhang; Jun Du; Zirui Wang; Jianshu Zhang

arXiv:1806.01506·cs.SD·May 3, 2019

Attention Based Fully Convolutional Network for Speech Emotion Recognition

Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang

PDF

1 Repo

TL;DR

This paper introduces an attention-based fully convolutional network that effectively recognizes speech emotions by focusing on emotion-relevant regions, handling variable-length speech, and leveraging transfer learning to improve accuracy.

Contribution

The paper proposes a novel attention mechanism within a fully convolutional network for speech emotion recognition, utilizing transfer learning with pre-trained models to enhance performance.

Findings

01

Achieved 70.4% weighted accuracy on IEMOCAP

02

Outperformed state-of-the-art methods

03

Demonstrated effectiveness of attention mechanism and transfer learning

Abstract

Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Considering limited data, the transfer learning is also adapted to improve the accuracy. Especially, it's interesting to observe obvious improvement obtained with natural scene image based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aris-ai/Audio-and-text-based-emotion-recognition
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.