Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers
Prashant Serai, Peidong Wang, Eric Fosler-Lussier

TL;DR
This paper enhances speech recognition error prediction models for modern neural network-based systems by introducing sampling methods and sequence-to-sequence models to better simulate and understand recognition errors, aiding NLP robustness.
Contribution
It extends previous phonetic confusion models with sampling techniques and sequence-to-sequence approaches to improve error prediction for neural network speech recognizers.
Findings
Sampling improves predictive accuracy significantly.
Sequence-to-sequence models perform comparably to confusion matrices.
Error prediction generalizes across different ASR systems.
Abstract
Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or even no audio data is available at train time. Previous work typically considered replicating behavior of GMM-HMM based systems, but the behavior of more modern posterior-based neural network acoustic models is not the same and requires adjustments to the error prediction model. In this work, we extend a prior phonetic confusion based model for predicting speech recognition errors in two ways: first, we introduce a sampling-based paradigm that better simulates the behavior of a posterior-based acoustic model. Second, we investigate replacing the confusion matrix with a sequence-to-sequence model in order to introduce context dependency into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
