Testing Correctness, Fairness, and Robustness of Speech Emotion   Recognition Models

Anna Derington; Hagen Wierstorf; Ali \"Ozkil; Florian Eyben; Felix; Burkhardt; Bj\"orn W. Schuller

arXiv:2312.06270·eess.AS·February 13, 2025·1 cites

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models

Anna Derington, Hagen Wierstorf, Ali \"Ozkil, Florian Eyben, Felix, Burkhardt, Bj\"orn W. Schuller

PDF

Open Access

TL;DR

This paper presents a comprehensive testing framework for speech emotion recognition models, evaluating their correctness, fairness, and robustness across various metrics and datasets to identify biases and shortcut dependencies.

Contribution

It introduces a novel testing framework with automatic threshold setting for fairness, and applies it to multiple models revealing issues like shortcut reliance and fairness disparities.

Findings

01

Models with high correlation may rely on shortcuts like text sentiment.

02

Significant differences in fairness observed among models.

03

Testing framework effectively uncovers biases and robustness issues.

Abstract

Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated based on a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themselves in model behaviour, which can be very different along different dimensions even if the same recall or correlation is achieved by the model. This paper introduces a testing framework to investigate behaviour of speech emotion recognition models, by requiring different metrics to reach a certain threshold in order to pass a test. The test metrics can be grouped in terms of correctness, fairness, and robustness. It also provides a method for automatically specifying test thresholds for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis