Long Short-Term Memory based Convolutional Recurrent Neural Networks for   Large Vocabulary Speech Recognition

Xiangang Li; Xihong Wu

arXiv:1610.03165·cs.CL·October 12, 2016·5 cites

Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition

Xiangang Li, Xihong Wu

PDF

Open Access

TL;DR

This paper introduces a novel convolutional recurrent neural network (CRNN) architecture combining CNNs and LSTM RNNs, achieving superior performance in large vocabulary speech recognition tasks.

Contribution

The paper proposes a new CRNN architecture that integrates CNNs and LSTM RNNs for improved speech recognition accuracy, demonstrating its effectiveness through extensive experiments.

Findings

01

LSTM CRNNs outperform traditional FFNNs and standalone LSTM RNNs.

02

The proposed architecture exceeds state-of-the-art speech recognition performance.

03

Experimental results validate the effectiveness of combining CNNs with LSTM RNNs.

Abstract

Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all sequence history. On the other hand, the convolutional neural networks (CNNs) have brought significant improvements to deep feed-forward neural networks (FFNNs), as they are able to better reduce spectral variation in the input signal. In this paper, a network architecture called as convolutional recurrent neural network (CRNN) is proposed by combining the CNN and LSTM RNN. In the proposed CRNNs, each speech frame, without adjacent context frames, is organized as a number of local feature patches along the frequency axis, and then a LSTM network is performed on each feature patch along the time axis. We train and compare FFNNs, LSTM RNNs and the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory