Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Yu Zhang; Guoguo Chen; Dong Yu; Kaisheng Yao; Sanjeev; Khudanpur; James Glass

arXiv:1510.08983·cs.NE·December 6, 2018·27 cites

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev, Khudanpur, James Glass

PDF

Open Access

TL;DR

This paper introduces highway connections in deep LSTM RNNs and latency-controlled bidirectional LSTMs to improve distant speech recognition, achieving state-of-the-art results on the AMI dataset.

Contribution

The paper proposes highway connections for deeper LSTMs and latency-controlled bidirectional LSTMs, enhancing information flow and performance in distant speech recognition tasks.

Findings

01

Achieved 43.9/47.7% WER on AMI dataset, outperforming previous models.

02

Deeper LSTMs with highway connections improve sequence training results.

03

Latency-controlled BLSTMs effectively utilize full history with controlled delay.

Abstract

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Efficient algorithms are proposed to train these novel networks using both frame and sequence discriminative criteria. Experiments on the AMI distant speech recognition (DSR) task indicate that we can train deeper LSTMs and achieve better improvement from sequence training with highway LSTMs (HLSTMs). Our novel model obtains $43.9/47.7%$ WER on AMI (SDM) dev and eval sets, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing