Scaling Language Models: Methods, Analysis & Insights from Training   Gopher

Jack W. Rae; Sebastian Borgeaud; Trevor Cai; Katie Millican; Jordan; Hoffmann; Francis Song; John Aslanides; Sarah Henderson; Roman Ring; Susannah; Young; Eliza Rutherford; Tom Hennigan; Jacob Menick; Albin Cassirer; Richard; Powell; George van den Driessche; Lisa Anne Hendricks; Maribeth Rauh; Po-Sen; Huang; Amelia Glaese; Johannes Welbl; Sumanth Dathathri; Saffron Huang,; Jonathan Uesato; John Mellor; Irina Higgins; Antonia Creswell; Nat McAleese,; Amy Wu; Erich Elsen; Siddhant Jayakumar; Elena Buchatskaya; David Budden,; Esme Sutherland; Karen Simonyan; Michela Paganini; Laurent Sifre; Lena; Martens; Xiang Lorraine Li; Adhiguna Kuncoro; Aida Nematzadeh; Elena; Gribovskaya; Domenic Donato; Angeliki Lazaridou; Arthur Mensch; Jean-Baptiste; Lespiau; Maria Tsimpoukelli; Nikolai Grigorev; Doug Fritz; Thibault Sottiaux,; Mantas Pajarskas; Toby Pohlen; Zhitao Gong; Daniel Toyama; Cyprien de Masson; d'Autume; Yujia Li; Tayfun Terzi; Vladimir Mikulik; Igor Babuschkin; Aidan; Clark; Diego de Las Casas; Aurelia Guy; Chris Jones; James Bradbury; Matthew; Johnson; Blake Hechtman; Laura Weidinger; Iason Gabriel; William Isaac; Ed; Lockhart; Simon Osindero; Laura Rimell; Chris Dyer; Oriol Vinyals; Kareem; Ayoub; Jeff Stanway; Lorrayne Bennett; Demis Hassabis; Koray Kavukcuoglu,; Geoffrey Irving

arXiv:2112.11446·cs.CL·January 24, 2022·243 cites

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan, Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah, Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard, Powell, George van den Driessche, Lisa Anne Hendricks

PDF

Open Access 3 Repos 4 Models 5 Datasets 1 Video

TL;DR

This paper analyzes the performance of Transformer-based language models, including Gopher with 280 billion parameters, across diverse tasks, highlighting how scale impacts capabilities, biases, and safety considerations.

Contribution

It provides a comprehensive analysis of large-scale language models, including new insights into their performance, biases, and safety implications, based on extensive evaluation and dataset analysis.

Findings

01

Scale improves performance in reading comprehension, fact-checking, and toxicity detection.

02

Logical and mathematical reasoning benefit less from increased scale.

03

Insights into bias, toxicity, and safety considerations in large language models.

Abstract

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

DeepMind’s New AI Thinks It Is A Genius! 🤖· youtube

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques