Residual LSTM: Design of a Deep Recurrent Architecture for Distant   Speech Recognition

Jaeyoung Kim; Mostafa El-Khamy; and Jungwon Lee

arXiv:1701.03360·cs.LG·June 7, 2017·34 cites

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee

PDF

Open Access 3 Repos

TL;DR

This paper introduces residual LSTM, a deep recurrent neural network architecture with spatial shortcut paths, improving training efficiency and recognition accuracy in distant speech recognition tasks.

Contribution

The paper proposes residual LSTM, which separates spatial and temporal shortcut paths, reuses existing LSTM components, and reduces parameters, advancing deep recurrent network design.

Findings

01

Residual LSTM achieves lower WER than plain and highway LSTM.

02

Residual LSTM reduces network parameters by over 10%.

03

On AMI SDM corpus, residual LSTM outperforms baselines in speech recognition.

Abstract

In this paper, a novel architecture for a deep recurrent neural network, residual LSTM is introduced. A plain LSTM has an internal memory cell that can learn long term dependencies of sequential data. It also provides a temporal shortcut path to avoid vanishing or exploding gradients in the temporal domain. The residual LSTM provides an additional spatial shortcut path from lower layers for efficient training of deep networks with multiple LSTM layers. Compared with the previous work, highway LSTM, residual LSTM separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows. Furthermore, residual LSTM reuses the output projection matrix and the output gate of LSTM to control the spatial information flow instead of additional gate networks, which effectively reduces more than 10% of network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory