Automatic Speech Recognition System-Independent Word Error Rate   Estimation

Chanho Park; Mingjie Chen; Thomas Hain

arXiv:2404.16743·cs.CL·April 29, 2024

Automatic Speech Recognition System-Independent Word Error Rate Estimation

Chanho Park, Mingjie Chen, Thomas Hain

PDF

Open Access

TL;DR

This paper introduces a novel method for estimating Word Error Rate (WER) in speech recognition that is independent of specific ASR systems, using hypothesis generation to improve robustness across domains.

Contribution

It proposes a system-independent WER estimation approach that trains on simulated ASR outputs, outperforming baselines on out-of-domain data.

Findings

01

Achieves state-of-the-art performance on out-of-domain datasets.

02

Outperforms baseline estimators in RMSE and Pearson correlation.

03

Performance improves when training WER matches evaluation WER.

Abstract

Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSparse Evolutionary Training