Exploring Gender Disparities in Automatic Speech Recognition Technology
Hend ElGhazaly, Bahman Mirheidari, Nafise Sadat Moosavi, Heidi, Christensen

TL;DR
This paper examines how gender representation and pitch variability in training data influence the fairness and accuracy of ASR systems, revealing complex interactions that affect performance across genders.
Contribution
It provides new insights into the impact of training data gender ratios and pitch variability on ASR fairness, emphasizing the importance of data curation.
Findings
Optimal fairness at specific gender distributions
Performance varies with pitch variability
Bias mitigation depends on training data composition
Abstract
This study investigates factors influencing Automatic Speech Recognition (ASR) systems' fairness and performance across genders, beyond the conventional examination of demographics. Using the LibriSpeech dataset and the Whisper small model, we analyze how performance varies across different gender representations in training data. Our findings suggest a complex interplay between the gender ratio in training data and ASR performance. Optimal fairness occurs at specific gender distributions rather than a simple 50-50 split. Furthermore, our findings suggest that factors like pitch variability can significantly affect ASR accuracy. This research contributes to a deeper understanding of biases in ASR systems, highlighting the importance of carefully curated training data in mitigating gender bias.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
