Word Error Rate Estimation Without ASR Output: e-WER2

Ahmed Ali; Steve Renals

arXiv:2008.03403·eess.AS·August 11, 2020

Word Error Rate Estimation Without ASR Output: e-WER2

Ahmed Ali, Steve Renals

PDF

1 Repo

TL;DR

This paper introduces e-WER2, a novel multistream end-to-end approach for estimating word error rate (WER) in speech recognition without requiring transcriptions or access to the ASR system, enabling efficient performance evaluation.

Contribution

The paper presents a new no-box WER estimation method using joint acoustic-lexical features and a multistream architecture, extending WER estimation to systems without ASR access.

Findings

01

No-box system achieves 0.56 Pearson correlation with reference WER.

02

Estimated WER has 0.24 RMSE across 1,400 sentences.

03

e-WER2 estimates WER with reasonable accuracy without transcriptions.

Abstract

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we continue our effort in estimating WER using acoustic, lexical and phonotactic features. Our novel approach to estimate the WER uses a multistream end-to-end architecture. We report results for systems using internal speech decoder features (glass-box), systems without speech decoder features (black-box), and for systems without having access to the ASR system (no-box). The no-box system learns joint acoustic-lexical representation from phoneme recognition results along with MFCC acoustic features to estimate WER. Considering WER per sentence, our no-box system achieves 0.56 Pearson correlation with the reference evaluation and 0.24 root mean square error (RMSE) across 1,400…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qcri/e-wer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.