Deep Recurrent Neural Networks for Acoustic Modelling
William Chan, Ian Lane

TL;DR
This paper introduces a novel deep RNN architecture combining DNN, TC, BLSTM, and DNN for acoustic modeling in speech recognition, achieving significant WER improvements on WSJ data.
Contribution
The paper proposes a new TC-DNN-BLSTM-DNN model that integrates convolutional and recurrent layers for enhanced acoustic modeling in ASR.
Findings
Achieved 3.47% WER on WSJ eval92
More than 8% relative WER reduction over baseline DNNs
Demonstrated effectiveness of combining DNN, TC, and BLSTM layers
Abstract
We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM then generates a context from the sequence acoustic signal, and the final DNN takes the context and models the posterior probabilities of the acoustic states. We achieve a 3.47 WER on the Wall Street Journal (WSJ) eval92 task or more than 8% relative improvement over the baseline DNN models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsConvolution
