A Neural Scaling Law from Lottery Ticket Ensembling

Ziming Liu; Max Tegmark

arXiv:2310.02258·cs.LG·February 5, 2024

A Neural Scaling Law from Lottery Ticket Ensembling

Ziming Liu, Max Tegmark

PDF

Open Access

TL;DR

This paper uncovers a new neural scaling law ($N^{-1}$) driven by lottery ticket ensembling, challenging previous theories and offering insights into model performance improvements with scale.

Contribution

It reveals that lottery ticket ensembling explains the $N^{-1}$ scaling law, providing a mechanistic and statistical understanding of neural performance scaling.

Findings

01

Lottery ticket ensembling causes the $N^{-1}$ scaling law.

02

The ensembling mechanism is supported by mechanistic and statistical analysis.

03

Implications for large language models and learning theories are discussed.

Abstract

Neural scaling laws (NSL) refer to the phenomenon where model performance improves with scale. Sharma & Kaplan analyzed NSL using approximation theory and predict that MSE losses decay as $N^{- α}$ , $α = 4/ d$ , where $N$ is the number of model parameters, and $d$ is the intrinsic input dimension. Although their theory works well for some cases (e.g., ReLU networks), we surprisingly find that a simple 1D problem $y = x^{2}$ manifests a different scaling law ( $α = 1$ ) from their predictions ( $α = 4$ ). We opened the neural networks and found that the new scaling law originates from lottery ticket ensembling: a wider network on average has more "lottery tickets", which are ensembled to reduce the variance of outputs. We support the ensembling mechanism by mechanistically interpreting single neural networks, as well as studying them statistically. We attribute the $N^{- 1}$ scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Computational Physics and Python Applications · Time Series Analysis and Forecasting