The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stani\'c, Dylan Ashley, Oleg Serikov, Louis Kirsch,, Francesco Faccio, J\"urgen Schmidhuber, Thomas Hofmann, Imanol Schlag

TL;DR
The paper introduces a compute-based experimental protocol and benchmarks for language models, enabling fair comparisons across different scales of compute and providing baseline models with analyzed scaling laws.
Contribution
It presents a novel compute-based comparison protocol, a high-quality dataset, and baseline models with scaling law analysis for reproducible language modelling research.
Findings
GPT-2 baseline outperforms LSTM in perplexity across compute levels
LSTM model shows more favorable scaling law due to throughput
Scaling law intersection occurs at approximately 50,000 accelerator hours
Abstract
The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Attention Dropout · Sigmoid Activation · GPT-2 · Tanh Activation · Residual Connection · Adam · Linear Layer
