A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition
Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schl\"uter,, Hermann Ney

TL;DR
This paper thoroughly investigates deep bidirectional LSTM RNNs for acoustic modeling in speech recognition, analyzing their architecture, training techniques, and performance improvements over traditional models.
Contribution
It provides a comprehensive analysis of deep bidirectional LSTM RNNs, including training methods, regularization, and layer-wise pretraining, achieving significant WER reductions.
Findings
Best 8-layer bidirectional LSTM achieved over 14% relative WER reduction.
Layer-wise pretraining improves deep LSTM performance.
Training time varies with model complexity and impacts recognition accuracy.
Abstract
We present a comprehensive study of deep bidirectional long short-term memory (LSTM) recurrent neural network (RNN) based acoustic models for automatic speech recognition (ASR). We study the effect of size and depth and train models of up to 8 layers. We investigate the training aspect and study different variants of optimization methods, batching, truncated backpropagation, different regularization techniques such as dropout and regularization, and different gradient clipping variants. The major part of the experimental analysis was performed on the Quaero corpus. Additional experiments also were performed on the Switchboard corpus. Our best LSTM model has a relative improvement in word error rate of over 14\% compared to our best feed-forward neural network (FFNN) baseline on the Quaero task. On this task, we get our best result with an 8 layer bidirectional LSTM and we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Gradient Clipping · Dropout · Long Short-Term Memory
