Sequence-level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models
Amber Afshan, Kshitiz Kumar, Jian Wu

TL;DR
This paper introduces a sequence-level confidence classifier for ASR that correlates well with accuracy and remains stable across model updates, enabling improved data selection for acoustic model adaptation and semi-supervised learning.
Contribution
A novel sequence-level confidence classifier that is interpretable, stable, and highly correlated with ASR accuracy, facilitating better data selection for model adaptation.
Findings
Sequence-level confidence scores achieve high correlation with ASR accuracy.
Using confidence scores for data selection improves word error rate reduction.
Method is effective in both supervised and semi-supervised adaptation scenarios.
Abstract
Scores from traditional confidence classifiers (CCs) in automatic speech recognition (ASR) systems lack universal interpretation and vary with updates to the underlying confidence or acoustic models (AMs). In this work, we build interpretable confidence scores with an objective to closely align with ASR accuracy. We propose a new sequence-level CC with a richer context providing CC scores highly correlated with ASR accuracy and scores stable across CC updates. Hence, expanding CC applications. Recently, AM customization has gained traction with the widespread use of unified models. Conventional adaptation strategies that customize AM expect well-matched data for the target domain with gold-standard transcriptions. We propose a cost-effective method of using CC scores to select an optimal adaptation data set, where we maximize ASR gains from minimal data. We study data in various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsAttention Model · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
