Lattice Rescoring Strategies for Long Short Term Memory Language Models   in Speech Recognition

Shankar Kumar; Michael Nirschl; Daniel Holtmann-Rice; Hank Liao,; Ananda Theertha Suresh; Felix Yu

arXiv:1711.05448·stat.ML·November 16, 2017·1 cites

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao,, Ananda Theertha Suresh, Felix Yu

PDF

Open Access

TL;DR

This paper evaluates lattice rescoring algorithms using LSTM language models to improve speech recognition accuracy, demonstrating an 8% relative reduction in word error rate on a YouTube speech dataset.

Contribution

It introduces new variants of lattice rescoring algorithms and evaluates their effectiveness with LSTM LMs in speech recognition.

Findings

01

LSTM LMs outperform N-gram LMs in speech recognition.

02

Lattice rescoring with LSTMLMs reduces WER by 8%.

03

New rescoring variants show promising results.

Abstract

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally more expensive than N-gram LMs for decoding, and thus, challenging to integrate into speech recognizers. Recent research has proposed the use of lattice-rescoring algorithms using RNNLMs and LSTMLMs as an efficient strategy to integrate these models into a speech recognition system. In this paper, we evaluate existing lattice rescoring algorithms along with new variants on a YouTube speech recognition task. Lattice rescoring using LSTMLMs reduces the word error rate (WER) for this task by 8\% relative to the WER obtained using an N-gram LM.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Domain Adaptation and Few-Shot Learning