Exploring Gender Disparities in Automatic Speech Recognition Technology

Hend ElGhazaly; Bahman Mirheidari; Nafise Sadat Moosavi; Heidi; Christensen

arXiv:2502.18434·cs.CL·February 26, 2025

Exploring Gender Disparities in Automatic Speech Recognition Technology

Hend ElGhazaly, Bahman Mirheidari, Nafise Sadat Moosavi, Heidi, Christensen

PDF

Open Access

TL;DR

This paper examines how gender representation and pitch variability in training data influence the fairness and accuracy of ASR systems, revealing complex interactions that affect performance across genders.

Contribution

It provides new insights into the impact of training data gender ratios and pitch variability on ASR fairness, emphasizing the importance of data curation.

Findings

01

Optimal fairness at specific gender distributions

02

Performance varies with pitch variability

03

Bias mitigation depends on training data composition

Abstract

This study investigates factors influencing Automatic Speech Recognition (ASR) systems' fairness and performance across genders, beyond the conventional examination of demographics. Using the LibriSpeech dataset and the Whisper small model, we analyze how performance varies across different gender representations in training data. Our findings suggest a complex interplay between the gender ratio in training data and ASR performance. Optimal fairness occurs at specific gender distributions rather than a simple 50-50 split. Furthermore, our findings suggest that factors like pitch variability can significantly affect ASR accuracy. This research contributes to a deeper understanding of biases in ASR systems, highlighting the importance of carefully curated training data in mitigating gender bias.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing