An Analysis of Neural Language Modeling at Multiple Scales

Stephen Merity; Nitish Shirish Keskar; Richard Socher

arXiv:1803.08240·cs.CL·March 23, 2018·143 cites

An Analysis of Neural Language Modeling at Multiple Scales

Stephen Merity, Nitish Shirish Keskar, Richard Socher

PDF

Open Access 5 Repos

TL;DR

This paper demonstrates that extending existing LSTM and QRNN language models to larger vocabularies and character-level granularity can achieve state-of-the-art results efficiently on multiple datasets using minimal computational resources.

Contribution

It shows that simple extensions of current models to larger vocabularies and character-level tasks can match or surpass complex architectures in performance.

Findings

01

LSTMs and QRNNs achieve state-of-the-art results on character and word-level datasets.

02

Models are trained efficiently within 12 hours to 2 days on a single GPU.

03

Extending models to larger vocabularies and character granularity is effective.

Abstract

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis