Improving Speech Recognition Error Prediction for Modern and   Off-the-shelf Speech Recognizers

Prashant Serai; Peidong Wang; Eric Fosler-Lussier

arXiv:2408.11258·cs.AI·August 22, 2024

Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers

Prashant Serai, Peidong Wang, Eric Fosler-Lussier

PDF

Open Access

TL;DR

This paper enhances speech recognition error prediction models for modern neural network-based systems by introducing sampling methods and sequence-to-sequence models to better simulate and understand recognition errors, aiding NLP robustness.

Contribution

It extends previous phonetic confusion models with sampling techniques and sequence-to-sequence approaches to improve error prediction for neural network speech recognizers.

Findings

01

Sampling improves predictive accuracy significantly.

02

Sequence-to-sequence models perform comparably to confusion matrices.

03

Error prediction generalizes across different ASR systems.

Abstract

Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or even no audio data is available at train time. Previous work typically considered replicating behavior of GMM-HMM based systems, but the behavior of more modern posterior-based neural network acoustic models is not the same and requires adjustments to the error prediction model. In this work, we extend a prior phonetic confusion based model for predicting speech recognition errors in two ways: first, we introduce a sampling-based paradigm that better simulates the behavior of a posterior-based acoustic model. Second, we investigate replacing the confusion matrix with a sequence-to-sequence model in order to introduce context dependency into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis