Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
Anna Derington, Hagen Wierstorf, Ali \"Ozkil, Florian Eyben, Felix, Burkhardt, Bj\"orn W. Schuller

TL;DR
This paper presents a comprehensive testing framework for speech emotion recognition models, evaluating their correctness, fairness, and robustness across various metrics and datasets to identify biases and shortcut dependencies.
Contribution
It introduces a novel testing framework with automatic threshold setting for fairness, and applies it to multiple models revealing issues like shortcut reliance and fairness disparities.
Findings
Models with high correlation may rely on shortcuts like text sentiment.
Significant differences in fairness observed among models.
Testing framework effectively uncovers biases and robustness issues.
Abstract
Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated based on a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themselves in model behaviour, which can be very different along different dimensions even if the same recall or correlation is achieved by the model. This paper introduces a testing framework to investigate behaviour of speech emotion recognition models, by requiring different metrics to reach a certain threshold in order to pass a test. The test metrics can be grouped in terms of correctness, fairness, and robustness. It also provides a method for automatically specifying test thresholds for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
