Future Word Contexts in Neural Network Language Models

Xie Chen; Xunying Liu; Anton Ragni; Yu Wang; Mark Gales

arXiv:1708.05592·cs.CL·August 21, 2017

Future Word Contexts in Neural Network Language Models

Xie Chen, Xunying Liu, Anton Ragni, Yu Wang, Mark Gales

PDF

TL;DR

This paper introduces a novel neural network model, su-RNNLMs, that efficiently incorporates future word context for language modeling, outperforming traditional uni-RNNLMs and nearly matching bi-RNNLMs in speech recognition tasks.

Contribution

The paper proposes su-RNNLMs, a new neural network structure using a feedforward unit for future context, improving training efficiency and rescoring performance.

Findings

01

su-RNNLMs outperform uni-RNNLMs in speech recognition tasks.

02

su-RNNLMs nearly match bi-RNNLMs in N-best rescoring.

03

Lattice rescoring with su-RNNLMs improves overall recognition accuracy.

Abstract

Recently, bidirectional recurrent network language models (bi-RNNLMs) have been shown to outperform standard, unidirectional, recurrent neural network language models (uni-RNNLMs) on a range of speech recognition tasks. This indicates that future word context information beyond the word history can be useful. However, bi-RNNLMs pose a number of challenges as they make use of the complete previous and future word context information. This impacts both training efficiency and their use within a lattice rescoring framework. In this paper these issues are addressed by proposing a novel neural network structure, succeeding word RNNLMs (su-RNNLMs). Instead of using a recurrent unit to capture the complete future word contexts, a feedforward unit is used to model a finite number of succeeding, future, words. This model can be trained much more efficiently than bi-RNNLMs and can also be used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.