Circling Back to Recurrent Models of Language

G\'abor Melis

arXiv:2211.01848·cs.CL·April 19, 2023·1 cites

Circling Back to Recurrent Models of Language

G\'abor Melis

PDF

Open Access

TL;DR

This paper revisits recurrent models of language, showing they can be competitive with modern architectures through improvements in cells, architecture, and optimization, achieving new state-of-the-art results on small datasets and Enwik8.

Contribution

It demonstrates that traditional recurrent models can be enhanced to perform competitively with modern models by optimizing their design and training methods.

Findings

01

Achieved new state-of-the-art on small datasets

02

Set new records on Enwik8 with dynamic evaluation

03

Recurrent models remain viable with proper improvements

Abstract

Just because some purely recurrent models suffer from being hard to optimize and inefficient on today's hardware, they are not necessarily bad models of language. We demonstrate this by the extent to which these models can still be improved by a combination of a slightly better recurrent cell, architecture, objective, as well as optimization. In the process, we establish a new state of the art for language modelling on small datasets and on Enwik8 with dynamic evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Parallel Computing and Optimization Techniques · Natural Language Processing Techniques