On the origin of neural scaling laws: from random graphs to natural language
Maissam Barkeshli, Alberto Alfarano, Andrey Gromov

TL;DR
This paper investigates the origins of neural scaling laws by studying simplified models like random walks on graphs and reduced language models, revealing that such laws can emerge without power law data structures.
Contribution
It demonstrates that neural scaling laws can arise in simplified settings without inherent power law data correlations and analyzes their evolution as language complexity decreases.
Findings
Neural scaling laws appear even in models trained on random graphs and simplified language.
Scaling exponents evolve monotonically as language models are simplified.
Reproduces key scaling behaviors with minimal transformer architectures.
Abstract
Scaling laws have played a major role in the modern AI revolution, providing practitioners predictive power over how the model performance will improve with increasing data, compute, and number of model parameters. This has spurred an intense interest in the origin of neural scaling laws, with a common suggestion being that they arise from power law structure already present in the data. In this paper we study scaling laws for transformers trained to predict random walks (bigrams) on graphs with tunable complexity. We demonstrate that this simplified setting already gives rise to neural scaling laws even in the absence of power law structure in the data correlations. We further consider dialing down the complexity of natural language systematically, by training on sequences sampled from increasingly simplified generative language models, from 4,2,1-layer transformer language models down…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Big Data and Digital Economy
