TL;DR
This paper identifies key pitfalls in current ASR auditing practices, especially for speech-impaired populations like aphasia, and proposes a comprehensive framework to improve the accuracy and fairness of ASR system evaluations.
Contribution
The paper introduces a holistic auditing framework addressing variability, demographic disparities, and error types in ASR performance assessments, demonstrated through a case study on aphasia speakers.
Findings
ASR performance is worse for aphasia speakers compared to controls.
Standard auditing methods mask variability and disparities in ASR performance.
A holistic framework reveals nuanced performance differences and error types.
Abstract
Automatic Speech Recognition (ASR) has transformed daily tasks from video transcription to workplace hiring. ASR systems' growing use warrants robust and standardized auditing approaches to ensure automated transcriptions of high and equitable quality. This is especially critical for people with speech and language disorders (such as aphasia) who may disproportionately depend on ASR systems to navigate everyday life. In this work, we identify three pitfalls in existing standard ASR auditing procedures, and demonstrate how addressing them impacts audit results via a case study of six popular ASR systems' performance for aphasia speakers. First, audits often adhere to a single method of text standardization during data pre-processing, which (a) masks variability in ASR performance from applying different standardization methods, and (b) may not be consistent with how users - especially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
