TL;DR
This paper presents a real-time, character-level incremental speech recognition system using RNNs trained with CTC, capable of low-latency processing and OOV word dictation, with competitive accuracy on WSJ data.
Contribution
It introduces a novel online beam search with depth-pruning for low-latency, incremental speech recognition using RNNs and CTC, enabling real-time response and OOV word recognition.
Findings
Achieved 8.90% WER on WSJ Nov'92 set
Developed a low-latency online beam search algorithm
System can recognize out-of-vocabulary words based on pronunciation
Abstract
In real-time speech recognition applications, the latency is an important issue. We have developed a character-level incremental speech recognition (ISR) system that responds quickly even during the speech, where the hypotheses are gradually improved while the speaking proceeds. The algorithm employs a speech-to-character unidirectional recurrent neural network (RNN), which is end-to-end trained with connectionist temporal classification (CTC), and an RNN-based character-level language model (LM). The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding. The RNN LM augments the decoding by providing long-term dependency information. We propose tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency. This system not only responds quickly on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
