A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic   Modeling in Speech Recognition

Albert Zeyer; Patrick Doetsch; Paul Voigtlaender; Ralf Schl\"uter,; Hermann Ney

arXiv:1606.06871·cs.NE·August 6, 2019

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schl\"uter,, Hermann Ney

PDF

TL;DR

This paper thoroughly investigates deep bidirectional LSTM RNNs for acoustic modeling in speech recognition, analyzing their architecture, training techniques, and performance improvements over traditional models.

Contribution

It provides a comprehensive analysis of deep bidirectional LSTM RNNs, including training methods, regularization, and layer-wise pretraining, achieving significant WER reductions.

Findings

01

Best 8-layer bidirectional LSTM achieved over 14% relative WER reduction.

02

Layer-wise pretraining improves deep LSTM performance.

03

Training time varies with model complexity and impacts recognition accuracy.

Abstract

We present a comprehensive study of deep bidirectional long short-term memory (LSTM) recurrent neural network (RNN) based acoustic models for automatic speech recognition (ASR). We study the effect of size and depth and train models of up to 8 layers. We investigate the training aspect and study different variants of optimization methods, batching, truncated backpropagation, different regularization techniques such as dropout and $L_{2}$ regularization, and different gradient clipping variants. The major part of the experimental analysis was performed on the Quaero corpus. Additional experiments also were performed on the Switchboard corpus. Our best LSTM model has a relative improvement in word error rate of over 14\% compared to our best feed-forward neural network (FFNN) baseline on the Quaero task. On this task, we get our best result with an 8 layer bidirectional LSTM and we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Gradient Clipping · Dropout · Long Short-Term Memory