Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee

TL;DR
This paper introduces residual LSTM, a deep recurrent neural network architecture with spatial shortcut paths, improving training efficiency and recognition accuracy in distant speech recognition tasks.
Contribution
The paper proposes residual LSTM, which separates spatial and temporal shortcut paths, reuses existing LSTM components, and reduces parameters, advancing deep recurrent network design.
Findings
Residual LSTM achieves lower WER than plain and highway LSTM.
Residual LSTM reduces network parameters by over 10%.
On AMI SDM corpus, residual LSTM outperforms baselines in speech recognition.
Abstract
In this paper, a novel architecture for a deep recurrent neural network, residual LSTM is introduced. A plain LSTM has an internal memory cell that can learn long term dependencies of sequential data. It also provides a temporal shortcut path to avoid vanishing or exploding gradients in the temporal domain. The residual LSTM provides an additional spatial shortcut path from lower layers for efficient training of deep networks with multiple LSTM layers. Compared with the previous work, highway LSTM, residual LSTM separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows. Furthermore, residual LSTM reuses the output projection matrix and the output gate of LSTM to control the spatial information flow instead of additional gate networks, which effectively reduces more than 10% of network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
