Predicting word error rate for reverberant speech

Hannes Gamper; Dimitra Emmanouilidou; Sebastian Braun; Ivan J. Tashev

arXiv:1911.00566·eess.AS·February 17, 2020

Predicting word error rate for reverberant speech

Hannes Gamper, Dimitra Emmanouilidou, Sebastian Braun, Ivan J. Tashev

PDF

TL;DR

This paper introduces methods to predict speech recognition error rates from acoustic parameters and reverberant speech samples, demonstrating improved accuracy over traditional measures and enabling blind estimation without detailed acoustic info.

Contribution

It proposes novel approaches for predicting WER directly from acoustic parameters and reverberant speech, including a CNN model for blind estimation, advancing ASR robustness assessment.

Findings

01

C50 and C80 correlate strongly with WER

02

Fitting approaches can predict WER accurately

03

CNN model outperforms parameter-based predictions

Abstract

Reverberation negatively impacts the performance of automatic speech recognition (ASR). Prior work on quantifying the effect of reverberation has shown that clarity (C50), a parameter that can be estimated from the acoustic impulse response, is correlated with ASR performance. In this paper we propose predicting ASR performance in terms of the word error rate (WER) directly from acoustic parameters via a polynomial, sigmoidal, or neural network fit, as well as blindly from reverberant speech samples using a convolutional neural network (CNN). We carry out experiments on two state-of-the-art ASR models and a large set of acoustic impulse responses (AIRs). The results confirm C50 and C80 to be highly correlated with WER, allowing WER to be predicted with the proposed fitting approaches. The proposed non-intrusive CNN model outperforms C50-based WER prediction, indicating that WER can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.