Analysis of Speech Separation Performance Degradation on Emotional   Speech Mixtures

Jia Qi Yip; Dianwen Ng; Bin Ma; Chng Eng Siong

arXiv:2309.07458·cs.SD·September 15, 2023

Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures

Jia Qi Yip, Dianwen Ng, Bin Ma, Chng Eng Siong

PDF

Open Access

TL;DR

This paper investigates how emotional content in speech mixtures degrades the performance of speech separation models, highlighting the need to consider emotions for real-world applications.

Contribution

It introduces a balanced emotional speech dataset and analyzes the impact of emotions on separation performance, revealing significant degradation even in strong models.

Findings

01

Emotional speech causes up to 5.1 dB SI-SDRi performance loss.

02

Models trained on neutral data still degrade with emotional speech.

03

Emotions significantly affect speech separation effectiveness.

Abstract

Despite recent strides made in Speech Separation, most models are trained on datasets with neutral emotions. Emotional speech has been known to degrade performance of models in a variety of speech tasks, which reduces the effectiveness of these models when deployed in real-world scenarios. In this paper we perform analysis to differentiate the performance degradation arising from the emotions in speech from the impact of out-of-domain inference. This is measured using a carefully designed test dataset, Emo2Mix, consisting of balanced data across all emotional combinations. We show that even models with strong out-of-domain performance such as Sepformer can still suffer significant degradation of up to 5.1 dB SI-SDRi on mixtures with strong emotions. This demonstrates the importance of accounting for emotions in real-world speech separation applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques