Circling Back to Recurrent Models of Language
G\'abor Melis

TL;DR
This paper revisits recurrent models of language, showing they can be competitive with modern architectures through improvements in cells, architecture, and optimization, achieving new state-of-the-art results on small datasets and Enwik8.
Contribution
It demonstrates that traditional recurrent models can be enhanced to perform competitively with modern models by optimizing their design and training methods.
Findings
Achieved new state-of-the-art on small datasets
Set new records on Enwik8 with dynamic evaluation
Recurrent models remain viable with proper improvements
Abstract
Just because some purely recurrent models suffer from being hard to optimize and inefficient on today's hardware, they are not necessarily bad models of language. We demonstrate this by the extent to which these models can still be improved by a combination of a slightly better recurrent cell, architecture, objective, as well as optimization. In the process, we establish a new state of the art for language modelling on small datasets and on Enwik8 with dynamic evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Parallel Computing and Optimization Techniques · Natural Language Processing Techniques
