Speech-based emotion recognition with self-supervised models using   attentive channel-wise correlations and label smoothing

Sofoklis Kakouros; Themos Stafylakis; Ladislav Mosner; Lukas Burget

arXiv:2211.01756·eess.AS·November 4, 2022

Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing

Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget

PDF

Open Access

TL;DR

This paper introduces a novel emotion recognition method from speech that uses self-supervised models with attentive channel-wise correlation pooling and label smoothing, achieving state-of-the-art results on the IEMOCAP dataset.

Contribution

It proposes a new attentive pooling technique based on correlations and incorporates label smoothing to handle noisy annotations in speech emotion recognition.

Findings

01

Outperforms existing methods on IEMOCAP dataset

02

Demonstrates the effectiveness of correlation-based attentive pooling

03

Shows robustness to label noise with label smoothing

Abstract

When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels. Self-supervised pre-trained representations can robustly capture information from speech enabling state-of-the-art results in many downstream tasks including emotion recognition. However, better ways of aggregating the information across time need to be considered as the relevant emotion information is likely to appear piecewise and not uniformly across the signal. For the labels, we need to take into account that there is a substantial degree of noise that comes from the subjective human annotations. In this paper, we propose a novel approach to attentive pooling based on correlations between the representations' coefficients combined with label smoothing, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing