Deep Recurrent Neural Networks for Acoustic Modelling

William Chan; Ian Lane

arXiv:1504.01482·cs.LG·April 8, 2015·31 cites

Deep Recurrent Neural Networks for Acoustic Modelling

William Chan, Ian Lane

PDF

Open Access

TL;DR

This paper introduces a novel deep RNN architecture combining DNN, TC, BLSTM, and DNN for acoustic modeling in speech recognition, achieving significant WER improvements on WSJ data.

Contribution

The paper proposes a new TC-DNN-BLSTM-DNN model that integrates convolutional and recurrent layers for enhanced acoustic modeling in ASR.

Findings

01

Achieved 3.47% WER on WSJ eval92

02

More than 8% relative WER reduction over baseline DNNs

03

Demonstrated effectiveness of combining DNN, TC, and BLSTM layers

Abstract

We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM then generates a context from the sequence acoustic signal, and the final DNN takes the context and models the posterior probabilities of the acoustic states. We achieve a 3.47 WER on the Wall Street Journal (WSJ) eval92 task or more than 8% relative improvement over the baseline DNN models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsConvolution