Improving the Performance of Online Neural Transducer Models

Tara N. Sainath; Chung-Cheng Chiu; Rohit Prabhavalkar; Anjuli Kannan,; Yonghui Wu; Patrick Nguyen; Zhifeng Chen

arXiv:1712.01807·cs.CL·December 6, 2017·6 cites

Improving the Performance of Online Neural Transducer Models

Tara N. Sainath, Chung-Cheng Chiu, Rohit Prabhavalkar, Anjuli Kannan,, Yonghui Wu, Patrick Nguyen, Zhifeng Chen

PDF

Open Access

TL;DR

This paper enhances online neural transducer models for streaming speech recognition by increasing attention window size, initializing from LAS models, and integrating stronger language models, achieving performance comparable to non-streaming models.

Contribution

It introduces methods to improve neural transducer performance, including attention window expansion, LAS-based initialization, and external language model integration.

Findings

01

Neural transducer performance matches LAS after improvements.

02

Attention window expansion improves online model context.

03

External language models boost recognition accuracy.

Abstract

Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to non-streaming models such as Listen, Attend and Spell (LAS). In this paper, we present various improvements to NT. Specifically, we look at increasing the window over which NT computes attention, mainly by looking backwards in time so the model still remains online. In addition, we explore initializing a NT model from a LAS-trained model so that it is guided with a better alignment. Finally, we explore including stronger language models such as using wordpiece models, and applying an external LM during the beam search. On a Voice Search task, we find with these improvements we can get NT to match the performance of LAS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsWordPiece