Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley,, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN, Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der, Wal

TL;DR
Pythia is a comprehensive suite of 16 large language models trained on identical data sequences, enabling detailed analysis of training dynamics, scaling effects, and biases with publicly available checkpoints and tools.
Contribution
Introduces Pythia, a controlled experimental framework with 16 LLMs for studying development, scaling, and biases, with publicly accessible resources for research.
Findings
Insights into memorization patterns in LLMs
Effects of term frequency on few-shot learning performance
Methods to reduce gender bias in language models
Abstract
How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗EleutherAI/pythia-14m-dedupedmodel· 20k dl· ♡ 2920k dl♡ 29
- 🤗EleutherAI/pythia-14mmodel· 95k dl· ♡ 195k dl♡ 1
- 🤗StentorLabs/Stentor2-12Mmodel· 124 dl· ♡ 2124 dl♡ 2
- 🤗ataeff/pythia-1bmodel· ♡ 1♡ 1
- 🤗EleutherAI/neox-ckpt-pythia-12b-dedupedmodel· ♡ 3♡ 3
- 🤗EleutherAI/neox-ckpt-pythia-410m-dedupedmodel
- 🤗EleutherAI/neox-ckpt-pythia-1.4b-dedupedmodel
- 🤗EleutherAI/pythia-160mmodel· 2.4M dl· ♡ 392.4M dl♡ 39
- 🤗EleutherAI/pythia-160m-dedupedmodel· 115k dl· ♡ 3115k dl♡ 3
- 🤗EleutherAI/pythia-1.4bmodel· 141k dl· ♡ 26141k dl♡ 26
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsPythia
