How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures

Tanvina Patel; Wiebke Hutiri; Aaron Yi Ding; Odette Scharenborg

arXiv:2507.05885·cs.CL·July 9, 2025

How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures

Tanvina Patel, Wiebke Hutiri, Aaron Yi Ding, Odette Scharenborg

PDF

Open Access

TL;DR

This paper compares various performance and bias measures for evaluating Dutch ASR systems, highlighting the limitations of standard error rates and proposing comprehensive reporting recommendations for diverse speaker groups.

Contribution

It systematically evaluates multiple performance and bias metrics for ASR, offering guidelines to improve assessment of system fairness and robustness.

Findings

01

Averaged error rates are insufficient alone for bias assessment.

02

Additional measures provide a more complete evaluation of ASR bias.

03

Recommendations improve reporting practices for diverse speaker groups.

Abstract

There is increasingly more evidence that automatic speech recognition (ASR) systems are biased against different speakers and speaker groups, e.g., due to gender, age, or accent. Research on bias in ASR has so far primarily focused on detecting and quantifying bias, and developing mitigation approaches. Despite this progress, the open question is how to measure the performance and bias of a system. In this study, we compare different performance and bias measures, from literature and proposed, to evaluate state-of-the-art end-to-end ASR systems for Dutch. Our experiments use several bias mitigation strategies to address bias against different speaker groups. The findings reveal that averaged error rates, a standard in ASR research, alone is not sufficient and should be supplemented by other measures. The paper ends with recommendations for reporting ASR performance and bias to better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and Audio Processing