Fast and Accurate Recurrent Neural Network Acoustic Models for Speech   Recognition

Ha\c{s}im Sak; Andrew Senior; Kanishka Rao; Fran\c{c}oise Beaufays

arXiv:1507.06947·cs.CL·July 27, 2015

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

Ha\c{s}im Sak, Andrew Senior, Kanishka Rao, Fran\c{c}oise Beaufays

PDF

TL;DR

This paper introduces techniques like frame stacking, reduced frame rate, and context-dependent modeling to enhance LSTM RNN acoustic models, achieving faster and more accurate large vocabulary speech recognition, including initial word output results.

Contribution

It presents novel methods to improve LSTM RNN acoustic models for speech recognition, including frame stacking, reduced frame rate, and context-dependent modeling, along with initial word output results.

Findings

01

Frame stacking and reduced frame rate improve accuracy and decoding speed.

02

Context-dependent phone modeling further enhances performance.

03

Initial results show potential for direct word output from LSTM RNNs.

Abstract

We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initialized with connectionist temporal classification (CTC). In this paper, we present techniques that further improve performance of LSTM RNN acoustic models for large vocabulary speech recognition. We show that frame stacking and reduced frame rate lead to more accurate models and faster decoding. CD phone modeling leads to further improvements. We also present initial results for LSTM RNN models outputting words directly.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory