Style-agnostic evaluation of ASR using multiple reference transcripts
Quinten McNamara, Miguel \'Angel del R\'io Fern\'andez, Nishchal, Bhandari, Martin Ratajczak, Danny Chen, Corey Miller, Mig\"uel Jett\'e

TL;DR
This paper introduces a style-agnostic evaluation method for ASR systems using multiple references to reduce style bias, revealing that traditional WER metrics may overestimate errors and enabling better comparison of models with different training styles.
Contribution
The authors propose a multireference evaluation approach that accounts for stylistic variations, improving the accuracy of ASR performance assessment.
Findings
Existing WER reports overestimate errors in state-of-the-art ASR systems.
Multireference evaluation helps compare ASR models with different training data styles.
Style-agnostic evaluation provides a more accurate measure of ASR content accuracy.
Abstract
Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style parameters. As a result, we find that existing WER reports are likely significantly over-estimating the number of contentful errors made by state-of-the-art ASR systems. In addition, we have found our multireference method to be a useful mechanism for comparing the quality of ASR models that differ in the stylistic makeup of their training data and target task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
