Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Anson, Ho, Marius Hobbhahn

TL;DR
This paper analyzes the growth trends of machine learning model sizes over time, highlighting a significant parameter gap in language models between 20B and 70B parameters and exploring possible reasons for this scarcity.
Contribution
It provides a comprehensive dataset and analysis of model size trends, identifies the parameter gap in language models, and proposes hypotheses to explain this phenomenon.
Findings
Model sizes in language models increased by seven orders of magnitude from 1950 to 2018.
Between 2018 and 2022, model sizes grew by an additional five orders of magnitude.
A notable scarcity of models in the 20-70B parameter range, termed the parameter gap.
Abstract
We study trends in model size of notable machine learning systems over time using a curated dataset. From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude. The trend then accelerated, with model size increasing by another five orders of magnitude in just 4 years from 2018 to 2022. Vision models grew at a more constant pace, totaling 7 orders of magnitude of growth between 1950 and 2022. We also identify that, since 2020, there have been many language models below 20B parameters, many models above 70B parameters, but a scarcity of models in the 20-70B parameter range. We refer to that scarcity as the parameter gap. We provide some stylized facts about the parameter gap and propose a few hypotheses to explain it. The explanations we favor are: (a) increasing model size beyond 20B parameters requires adopting different parallelism techniques,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Dropout · {Dispute@FaQ-s}How to file a dispute with Expedia? · Byte Pair Encoding · Adam · Cosine Annealing · Linear Warmup With Cosine Annealing
