TL;DR
This paper proposes an extension of lattice recurrent neural networks to incorporate sub-word information for confidence estimation in black box speech recognition systems, significantly improving reliability assessments.
Contribution
It introduces a novel lattice RNN model that leverages sub-word data to enhance confidence scoring in black box ASR systems, addressing a key limitation.
Findings
Significant improvement in confidence estimation accuracy
Effective use of sub-word information in lattice RNNs
Validated on IARPA OpenKWS 2016 dataset
Abstract
Recently, there has been growth in providers of speech transcription services enabling others to leverage technology they would not normally be able to use. As a result, speech-enabled solutions have become commonplace. Their success critically relies on the quality, accuracy, and reliability of the underlying speech transcription systems. Those black box systems, however, offer limited means for quality control as only word sequences are typically available. This paper examines this limited resource scenario for confidence estimation, a measure commonly used to assess transcription reliability. In particular, it explores what other sources of word and sub-word level information available in the transcription process could be used to improve confidence scores. To encode all such information this paper extends lattice recurrent neural networks to handle sub-words. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
