On the State of the Art of Evaluation in Neural Language Models
G\'abor Melis, Chris Dyer, Phil Blunsom

TL;DR
This paper reevaluates neural language models using standardized tuning and finds that well-regularized LSTMs outperform newer architectures, establishing new benchmarks on key datasets.
Contribution
It demonstrates that proper regularization and hyperparameter tuning can make traditional LSTMs outperform recent models in language modeling.
Findings
LSTMs outperform recent models when properly regularized.
Established new state-of-the-art results on Penn Treebank and Wikitext-2.
Provided strong baselines on Hutter Prize dataset.
Abstract
Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
