A Neural Scaling Law from Lottery Ticket Ensembling
Ziming Liu, Max Tegmark

TL;DR
This paper uncovers a new neural scaling law ($N^{-1}$) driven by lottery ticket ensembling, challenging previous theories and offering insights into model performance improvements with scale.
Contribution
It reveals that lottery ticket ensembling explains the $N^{-1}$ scaling law, providing a mechanistic and statistical understanding of neural performance scaling.
Findings
Lottery ticket ensembling causes the $N^{-1}$ scaling law.
The ensembling mechanism is supported by mechanistic and statistical analysis.
Implications for large language models and learning theories are discussed.
Abstract
Neural scaling laws (NSL) refer to the phenomenon where model performance improves with scale. Sharma & Kaplan analyzed NSL using approximation theory and predict that MSE losses decay as , , where is the number of model parameters, and is the intrinsic input dimension. Although their theory works well for some cases (e.g., ReLU networks), we surprisingly find that a simple 1D problem manifests a different scaling law () from their predictions (). We opened the neural networks and found that the new scaling law originates from lottery ticket ensembling: a wider network on average has more "lottery tickets", which are ensembled to reduce the variance of outputs. We support the ensembling mechanism by mechanistically interpreting single neural networks, as well as studying them statistically. We attribute the scaling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Computational Physics and Python Applications · Time Series Analysis and Forecasting
