On the Invariance and Generality of Neural Scaling Laws
Xing Han, Ziyin Liu, Suchi Saria, Paul Pu Liang

TL;DR
This paper investigates how neural scaling laws can be generalized across different domains by identifying invariants and transformations that preserve or predict their behavior, enabling resource-efficient transfer of scaling insights.
Contribution
It introduces a theoretical framework based on invariants and information-theoretic transformations to develop generalizable scaling laws across diverse data modalities.
Findings
Scaling laws are preserved under bijective data transformations.
Non-bijective transformations predictably modify scaling laws based on information resolution.
Cross-domain predictions of model scaling are accurate within 3% error.
Abstract
Neural scaling laws establish a predictable relationship between model performance and data or compute, offering crucial guidance for resource allocation in new domains and tasks. Yet such laws are most needed precisely where they are hardest to obtain: fitting one for a new model task pair demands expensive sweeps that typically exhaust the very compute budget the law is meant to economize. This paper poses the research question of how to develop generalizable scaling laws: laws fit once on a well-resourced source domain and reliably transported to new domains where running a full sweep is infeasible, which requires a fundamental understanding of when and why scaling properties change. We address this by identifying the right invariants: scaling laws are preserved under bijective (information-preserving) transformations of the data and modified in predictable, information-theoretically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
