N-gram Language Modeling using Recurrent Neural Network Estimation
Ciprian Chelba, Mohammad Norouzi, Samy Bengio

TL;DR
This paper explores using LSTM-based neural networks for n-gram language modeling, demonstrating improved performance with longer contexts and practical advantages for certain applications.
Contribution
It introduces LSTM n-gram smoothing, showing its effectiveness for long contexts and practical benefits over traditional models, especially for large-scale data.
Findings
LSTM n-gram models outperform traditional smoothing methods for long contexts
Performance improves with increasing n-gram order, up to 13
LSTM n-gram smoothing is effective at large scale, e.g., One Billion Words benchmark
Abstract
We investigate the effective memory depth of RNN models by using them for -gram language model (LM) smoothing. Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the -gram state when compared with feed-forward and vanilla RNN models. When preserving the sentence independence assumption the LSTM -gram matches the LSTM LM performance for and slightly outperforms it for . When allowing dependencies across sentence boundaries, the LSTM -gram almost matches the perplexity of the unlimited history LSTM LM. LSTM -gram smoothing also has the desirable property of improving with increasing -gram order, unlike the Katz or Kneser-Ney back-off estimators. Using multinomial distributions as targets in training instead of the usual one-hot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Stochastic Gradient Optimization Techniques · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Dropout · Long Short-Term Memory
