Explainable Speech Emotion Recognition: Weighted Attribute Fairness to Model Demographic Contributions to Social Bias
Tomisin Ogunnubi, Yupei Li, Bj\"orn Schuller

TL;DR
This paper introduces a fairness modeling approach for Speech Emotion Recognition systems that explicitly captures demographic biases and quantifies individual attribute contributions, revealing gender bias in popular models.
Contribution
It proposes a novel fairness metric that captures joint demographic-model error relationships and applies it to SSL-based SER models, highlighting social biases.
Findings
The fairness metric captures more mutual information between demographics and biases.
Applied to HuBERT and WavLM, it reveals gender bias in both models.
The approach quantifies the absolute contribution of individual demographic attributes to bias.
Abstract
Speech Emotion Recognition (SER) systems have growing applications in sensitive domains such as mental health and education, where biased predictions can cause harm. Traditional fairness metrics, such as Equalised Odds and Demographic Parity, often overlook the joint dependency between demographic attributes and model predictions. We propose a fairness modelling approach for SER that explicitly captures allocative bias by learning the joint relationship between demographic attributes and model error. We validate our fairness metric on synthetic data, then apply it to evaluate HuBERT and WavLM models finetuned on the CREMA-D dataset. Our results indicate that the proposed fairness model captures more mutual information between protected attributes and biases and quantifies the absolute contribution of individual attributes to bias in SSL-based SER models. Additionally, our analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
