Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

Yi-Cheng Lin; Huang-Cheng Chou; Hung-yi Lee

arXiv:2505.14449·eess.AS·June 2, 2025

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach

Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee

PDF

Open Access

TL;DR

This paper proposes a novel implicit demographic inference method using pseudo-labeling and unsupervised learning to reduce subgroup disparities in speech emotion recognition without relying on explicit demographic labels.

Contribution

It introduces the IDI module that leverages pseudo-labeling and clustering to mitigate bias in SER, enhancing fairness while maintaining high accuracy.

Findings

01

Pseudo-labeling IDI reduces subgroup disparities by over 28%.

02

Unsupervised IDI improves fairness metrics by more than 4.6%.

03

Both methods maintain SER accuracy within 2-3.6% decrease.

Abstract

While subgroup disparities and performance bias are increasingly studied in computational research, fairness in categorical Speech Emotion Recognition (SER) remains underexplored. Existing methods often rely on explicit demographic labels, which are difficult to obtain due to privacy concerns. To address this limitation, we introduce an Implicit Demography Inference (IDI) module that leverages pseudo-labeling from a pre-trained model and unsupervised learning using k-means clustering to mitigate bias in SER. Our experiments show that pseudo-labeling IDI reduces subgroup disparities, improving fairness metrics by over 28% with less than a 2% decrease in SER accuracy. Also, the unsupervised IDI yields more than a 4.6% improvement in fairness metrics with a drop of less than 3.6% in SER performance. Further analyses reveal that the unsupervised IDI consistently mitigates race and age…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and Audio Processing

Methodsk-Means Clustering