High Order Recurrent Neural Networks for Acoustic Modelling

Chao Zhang; Philip Woodland

arXiv:1802.08314·cs.CL·February 26, 2018·1 cites

High Order Recurrent Neural Networks for Acoustic Modelling

Chao Zhang, Philip Woodland

PDF

Open Access

TL;DR

This paper introduces high order RNNs (HORNNs) with multiple previous step connections to combat vanishing gradients in acoustic modeling, achieving comparable or better performance than LSTMs with fewer parameters.

Contribution

The paper proposes HORNN architectures that reduce vanishing gradients and match LSTM performance with fewer parameters in speech recognition tasks.

Findings

01

HORNNs reduce word error rates by 4.2% and 6.3% over standard RNNs.

02

HORNNs achieve similar WERs to LSTMs while using 20-50% fewer parameters.

03

Experimental results on British English MGB3 data demonstrate effectiveness of HORNNs.

Abstract

Vanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the memory cells mean an LSTM layer has four times as many parameters as an RNN with the same hidden vector size. This paper addresses the vanishing gradient problem using a high order RNN (HORNN) which has additional connections from multiple previous time steps. Speech recognition experiments using British English multi-genre broadcast (MGB3) data showed that the proposed HORNN architectures for rectified linear unit and sigmoid activation functions reduced word error rates (WER) by 4.2% and 6.3% over the corresponding RNNs, and gave similar WERs to a (projected) LSTM while using only 20%--50% of the recurrent layer parameters and computation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory