Pointer Sentinel Mixture Models
Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

TL;DR
The paper introduces the pointer sentinel mixture model, enhancing language modeling by combining copying from recent context with standard softmax predictions, achieving state-of-the-art results with fewer parameters.
Contribution
It proposes a novel pointer sentinel architecture that improves language modeling by effectively handling rare words and reducing model complexity.
Findings
Achieves 70.9 perplexity on Penn Treebank.
Uses fewer parameters than standard softmax models.
Introduces the WikiText corpus for better evaluation.
Abstract
Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
Methods[LivE@PeRson]How do I talk to a real person at Expedia? · Pointer Network · Variational Dropout · Zoneout · Pointer Sentinel-LSTM · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Softmax
