
TL;DR
This paper introduces layer trajectory LSTM (ltLSTM), a novel architecture that enhances deep LSTM training by using a layer-LSTM to summarize layer outputs, significantly reducing word error rates in speech recognition.
Contribution
The paper proposes ltLSTM, a new architecture that improves training of deep LSTMs by capturing layer trajectories, leading to better performance without increasing computation time.
Findings
ltLSTM outperforms standard multi-layer LSTM.
Achieves up to 9.0% relative WER reduction.
Maintains same computation time as standard LSTM.
Abstract
It is popular to stack LSTM layers to get better modeling power, especially when large amount of training data is available. However, an LSTM-RNN with too many vanilla LSTM layers is very hard to train and there still exists the gradient vanishing issue if the network goes too deep. This issue can be partially solved by adding skip connections between layers, such as residual LSTM. In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM. This layer-LSTM scans the outputs from time-LSTMs, and uses the summarized layer trajectory information for final senone classification. The forward-propagation of time-LSTM and layer-LSTM can be handled in two separate threads in parallel so that the network computation time is the same as the standard time-LSTM. With a layer-LSTM running through layers, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
