Quantifying Bias in Automatic Speech Recognition
Siyuan Feng, Olya Kudina, Bence Mark Halpern, Odette Scharenborg

TL;DR
This paper systematically quantifies biases in a Dutch state-of-the-art ASR system across gender, age, and accents, revealing specific error patterns and proposing mitigation strategies.
Contribution
It introduces a comprehensive bias analysis framework for ASR systems, focusing on phoneme-level errors and articulation differences, which is novel in the context of Dutch ASR.
Findings
Bias varies significantly across gender, age, and accents.
Phoneme-level errors reveal articulation-related biases.
Bias mitigation strategies are proposed based on findings.
Abstract
Automatic speech recognition (ASR) systems promise to deliver objective interpretation of human speech. Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with the large variation in speech due to e.g., gender, age, speech impairment, race, and accents. Many factors can cause the bias of an ASR system. Our overarching goal is to uncover bias in ASR systems to work towards proactive bias mitigation in ASR. This paper is a first step towards this goal and systematically quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents. Word error rates are compared, and an in-depth phoneme-level error analysis is conducted to understand where bias is occurring. We primarily focus on bias due to articulation differences in the dataset. Based on our findings, we suggest bias mitigation strategies for ASR development.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing
