Decoding Emotions: A comprehensive Multilingual Study of Speech Models   for Speech Emotion Recognition

Anant Singh; Akshat Gupta

arXiv:2308.08713·cs.CL·August 21, 2023

Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

Anant Singh, Akshat Gupta

PDF

Open Access 1 Repo

TL;DR

This study evaluates transformer-based speech models for emotion recognition across multiple languages, revealing that specific layers optimize performance and achieving state-of-the-art results in German and Persian.

Contribution

It provides a comprehensive multilingual benchmark for speech emotion recognition and insights into which model layers best capture emotional information.

Findings

01

Single-layer features reduce error rate by 32% on average.

02

Achieved state-of-the-art results for German and Persian.

03

Middle layers of models contain most emotional information.

Abstract

Recent advancements in transformer-based speech representation models have greatly transformed speech processing. However, there has been limited research conducted on evaluating these models for speech emotion recognition (SER) across multiple languages and examining their internal representations. This article addresses these gaps by presenting a comprehensive benchmark for SER with eight speech representation models and six different languages. We conducted probing experiments to gain insights into inner workings of these models for SER. We find that using features from a single optimal layer of a speech model reduces the error rate by 32\% on average across seven datasets when compared to systems where features from all layers of speech models are used. We also achieve state-of-the-art results for German and Persian languages. Our probing results indicate that the middle layers of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

95anantsingh/decoding-emotions
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and Audio Processing