Towards better decoding and language model integration in sequence to   sequence models

Jan Chorowski; Navdeep Jaitly

arXiv:1612.02695·cs.NE·December 9, 2016·58 cites

Towards better decoding and language model integration in sequence to sequence models

Jan Chorowski, Navdeep Jaitly

PDF

Open Access

TL;DR

This paper analyzes attention-based sequence-to-sequence speech recognition, identifies key shortcomings, and proposes practical solutions that improve transcription accuracy with and without language models.

Contribution

It introduces solutions to overconfidence and incomplete transcriptions in seq2seq speech recognition, achieving competitive WER on the WSJ dataset.

Findings

01

Achieved 10.6% WER without language models

02

Reduced WER to 6.7% with trigram language model

03

Identified and addressed overconfidence and incompleteness issues

Abstract

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion. In this contribution, we analyse an attention-based seq2seq speech recognition system that directly transcribes recordings into characters. We observe two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used. We propose practical solutions to both problems achieving competitive speaker independent word error rates on the Wall Street Journal dataset: without separate language models we reach 10.6% WER, while together with a trigram language model, we reach 6.7% WER.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence