First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs
Awni Y. Hannun, Andrew L. Maas, Daniel Jurafsky, Andrew Y. Ng

TL;DR
This paper introduces a neural network-based method for first-pass large vocabulary continuous speech recognition that bypasses traditional HMM frameworks, achieving competitive accuracy with a novel decoding algorithm.
Contribution
It demonstrates that a simple bi-directional recurrent neural network can effectively perform speech recognition without HMMs, using a new prefix-search decoding method.
Findings
Bi-directional RNNs improve recognition accuracy.
The proposed decoding algorithm enables effective first-pass recognition.
Competitive word error rates achieved on Wall Street Journal corpus.
Abstract
We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but building such systems is a complex, domain-specific task. Recent work demonstrated the feasibility of discarding the HMM sequence modeling framework by directly predicting transcript text from audio. This paper extends this approach in two ways. First, we demonstrate that a straightforward recurrent neural network architecture can achieve a high level of accuracy. Second, we propose and evaluate a modified prefix-search decoding algorithm. This approach to decoding enables first-pass speech recognition with a language model, completely unaided by the cumbersome infrastructure of HMM-based systems. Experiments on the Wall Street Journal corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
