Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Chunxi Liu, Michael Picheny, Leda Sar{\i}, Pooja Chitkara, Alex Xiao,, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

TL;DR
This paper evaluates the fairness of speech recognition models using the Casual Conversations dataset, revealing biases related to gender and skin tone, and encourages community efforts to mitigate these biases.
Contribution
It introduces a new dataset with human transcriptions for evaluating fairness in ASR and provides initial bias analysis across multiple models.
Findings
Significant bias in word error rates across gender and skin tone.
Evaluation of multiple models shows consistent bias patterns.
Releasing transcripts to foster bias reduction research.
Abstract
It is well known that many machine learning systems demonstrate bias towards specific groups of individuals. This problem has been studied extensively in the Facial Recognition area, but much less so in Automatic Speech Recognition (ASR). This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone. The entire corpus has been manually transcribed, allowing for detailed ASR evaluations across these metadata. Multiple ASR models are evaluated, including models trained on LibriSpeech, 14,000 hour transcribed, and over 2 million hour untranscribed social media videos. Significant differences in word error rate across gender and skin tone are observed at times for all models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
