How Speech is Recognized to Be Emotional - A Study Based on Information   Decomposition

Haoran Sun; Lantian Li; Thomas Fang Zheng; Dong Wang

arXiv:2111.12324·cs.SD·November 25, 2021

How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Haoran Sun, Lantian Li, Thomas Fang Zheng, Dong Wang

PDF

Open Access 1 Repo

TL;DR

This study investigates how different speech information factors contribute to emotion recognition, revealing rhythm as most important and highlighting the challenges of cross-corpus generalization in current models.

Contribution

The paper introduces a decomposition-based analysis of speech signals to identify key emotional factors and assesses their impact on emotion recognition performance.

Findings

01

Rhythm is the most crucial component for emotional expression.

02

Cross-corpus emotion recognition performance is poor, often worse than random guessing.

03

Removing unimportant components can improve cross-corpus results.

Abstract

The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we are able to decompose speech signals into separate information factors (content, pitch, rhythm). Based on this decomposition, we carefully studied the performance of each information component and their combinations. We conducted the study on three different speech emotion corpora and chose an attention-based convolutional RNN as the emotion classifier. Our results show that rhythm is the most important component for emotional expression. Moreover, the cross-corpus results are very bad (even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fantsun/speechflow
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Speech and Audio Processing