Scaling Laws for Deep Learning
Jonathan S. Rosenfeld

TL;DR
This paper investigates the scaling laws governing deep learning, demonstrating their predictability across models and tasks, analyzing their theoretical origins, and proposing methods to approach the fundamental error limits.
Contribution
It establishes predictable scaling laws for deep learning training and pruning, analyzes their theoretical basis, and suggests a pathway to reduce errors to fundamental limits.
Findings
Scaling laws are predictable for state-of-the-art models and tasks.
Deep learning errors are dominated by sources far from the theoretical minimum.
A conjectural approach using Nyquist learners could reach the generalization error lower limit.
Abstract
Running faster will only get you so far -- it is generally advisable to first understand where the roads lead, then get a car ... The renaissance of machine learning (ML) and deep learning (DL) over the last decade is accompanied by an unscalable computational cost, limiting its advancement and weighing on the field in practice. In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs. We first demonstrate that DL training and pruning are predictable and governed by scaling laws -- for state of the art models and tasks, spanning image classification and language modeling, as well as for state of the art model compression via iterative pruning. Predictability, via the establishment of these scaling laws, provides the path for principled design and trade-off reasoning, currently largely lacking in the field. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning
MethodsPruning
