The Languini Kitchen: Enabling Language Modelling Research at Different   Scales of Compute

Aleksandar Stani\'c; Dylan Ashley; Oleg Serikov; Louis Kirsch,; Francesco Faccio; J\"urgen Schmidhuber; Thomas Hofmann; Imanol Schlag

arXiv:2309.11197·cs.LG·September 21, 2023·2 cites

The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

Aleksandar Stani\'c, Dylan Ashley, Oleg Serikov, Louis Kirsch,, Francesco Faccio, J\"urgen Schmidhuber, Thomas Hofmann, Imanol Schlag

PDF

Open Access 1 Repo

TL;DR

The paper introduces a compute-based experimental protocol and benchmarks for language models, enabling fair comparisons across different scales of compute and providing baseline models with analyzed scaling laws.

Contribution

It presents a novel compute-based comparison protocol, a high-quality dataset, and baseline models with scaling law analysis for reproducible language modelling research.

Findings

01

GPT-2 baseline outperforms LSTM in perplexity across compute levels

02

LSTM model shows more favorable scaling law due to throughput

03

Scaling law intersection occurs at approximately 50,000 accelerator hours

Abstract

The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fabianwinter93/JAX/tree/main/QLSTM
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Attention Dropout · Sigmoid Activation · GPT-2 · Tanh Activation · Residual Connection · Adam · Linear Layer