Exploring the Limits of Language Modeling

Rafal Jozefowicz; Oriol Vinyals; Mike Schuster; Noam Shazeer; Yonghui; Wu

arXiv:1602.02410·cs.CL·February 15, 2016·916 cites

Exploring the Limits of Language Modeling

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui, Wu

PDF

Open Access 5 Repos

TL;DR

This paper investigates advanced RNN architectures for large-scale language modeling, achieving significant perplexity reductions and setting new state-of-the-art benchmarks on the One Billion Word dataset.

Contribution

It introduces improved RNN models that handle large vocabularies and long-term dependencies, achieving lower perplexity with fewer parameters and releasing models for community use.

Findings

01

Single model perplexity reduced to 30.0 from 51.3

02

Ensemble model achieves perplexity of 23.7

03

Models are significantly more parameter-efficient

Abstract

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems