Utterance-level neural confidence measure for end-to-end children speech recognition
Wei Liu, Tan Lee

TL;DR
This paper investigates an utterance-level neural confidence measure for end-to-end children speech recognition, emphasizing feature efficacy and evaluation metrics, to improve confidence estimation in challenging speech scenarios.
Contribution
It introduces a neural confidence measure for E2E ASR, compares predictor features, and evaluates performance specifically on children speech with insights on effective features and metrics.
Findings
Acoustic features are more important than linguistic features for confidence estimation.
N-best score features outperform single-best features.
EER and AUC metrics are inadequate for mismatched ASR evaluation.
Abstract
Confidence measure is a performance index of particular importance for automatic speech recognition (ASR) systems deployed in real-world scenarios. In the present study, utterance-level neural confidence measure (NCM) in end-to-end automatic speech recognition (E2E ASR) is investigated. The E2E system adopts the joint CTC-attention Transformer architecture. The prediction of NCM is formulated as a task of binary classification, i.e., accept/reject the input utterance, based on a set of predictor features acquired during the ASR decoding process. The investigation is focused on evaluating and comparing the efficacies of predictor features that are derived from different internal and external modules of the E2E system. Experiments are carried out on children speech, for which state-of-the-art ASR systems show less than satisfactory performance and robust confidence measure is particularly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Dense Connections · Label Smoothing · Dropout
