Speaker Attentive Speech Emotion Recognition

Cl\'ement Le Moine; Nicolas Obin; Axel Roebel

arXiv:2104.07288·eess.AS·April 16, 2021

Speaker Attentive Speech Emotion Recognition

Cl\'ement Le Moine, Nicolas Obin, Axel Roebel

PDF

Open Access

TL;DR

This paper introduces a speaker-attentive speech emotion recognition system that leverages speaker identity information through a novel Self Speaker Attention mechanism, significantly improving emotion recognition accuracy.

Contribution

The work proposes a dual-classifier system with SSA that enhances emotion recognition by incorporating speaker identity, achieving state-of-the-art results.

Findings

01

Improved emotion recognition accuracy on Att-HACK and IEMOCAP datasets.

02

SSA mechanism effectively focuses on emotional speech features.

03

Achieved state-of-the-art unweighted average recall scores.

Abstract

Speech Emotion Recognition (SER) task has known significant improvements over the last years with the advent of Deep Neural Networks (DNNs). However, even the most successful methods are still rather failing when adaptation to specific speakers and scenarios is needed, inevitably leading to poorer performances when compared to humans. In this paper, we present novel work based on the idea of teaching the emotion recognition network about speaker identity. Our system is a combination of two ACRNN classifiers respectively dedicated to speaker and emotion recognition. The first informs the latter through a Self Speaker Attention (SSA) mechanism that is shown to considerably help to focus on emotional information of the speech signal. Experiments on social attitudes database Att-HACK and IEMOCAP corpus demonstrate the effectiveness of the proposed method and achieve the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Emotion and Mood Recognition