Style-agnostic evaluation of ASR using multiple reference transcripts

Quinten McNamara; Miguel \'Angel del R\'io Fern\'andez; Nishchal; Bhandari; Martin Ratajczak; Danny Chen; Corey Miller; Mig\"uel Jett\'e

arXiv:2412.07937·cs.CL·December 12, 2024

Style-agnostic evaluation of ASR using multiple reference transcripts

Quinten McNamara, Miguel \'Angel del R\'io Fern\'andez, Nishchal, Bhandari, Martin Ratajczak, Danny Chen, Corey Miller, Mig\"uel Jett\'e

PDF

Open Access

TL;DR

This paper introduces a style-agnostic evaluation method for ASR systems using multiple references to reduce style bias, revealing that traditional WER metrics may overestimate errors and enabling better comparison of models with different training styles.

Contribution

The authors propose a multireference evaluation approach that accounts for stylistic variations, improving the accuracy of ASR performance assessment.

Findings

01

Existing WER reports overestimate errors in state-of-the-art ASR systems.

02

Multireference evaluation helps compare ASR models with different training data styles.

03

Style-agnostic evaluation provides a more accurate measure of ASR content accuracy.

Abstract

Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style parameters. As a result, we find that existing WER reports are likely significantly over-estimating the number of contentful errors made by state-of-the-art ASR systems. In addition, we have found our multireference method to be a useful mechanism for comparing the quality of ASR models that differ in the stylistic makeup of their training data and target task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques