Scaling Laws for Deep Learning

Jonathan S. Rosenfeld

arXiv:2108.07686·cs.LG·August 18, 2021·25 cites

Scaling Laws for Deep Learning

Jonathan S. Rosenfeld

PDF

Open Access

TL;DR

This paper investigates the scaling laws governing deep learning, demonstrating their predictability across models and tasks, analyzing their theoretical origins, and proposing methods to approach the fundamental error limits.

Contribution

It establishes predictable scaling laws for deep learning training and pruning, analyzes their theoretical basis, and suggests a pathway to reduce errors to fundamental limits.

Findings

01

Scaling laws are predictable for state-of-the-art models and tasks.

02

Deep learning errors are dominated by sources far from the theoretical minimum.

03

A conjectural approach using Nyquist learners could reach the generalization error lower limit.

Abstract

Running faster will only get you so far -- it is generally advisable to first understand where the roads lead, then get a car ... The renaissance of machine learning (ML) and deep learning (DL) over the last decade is accompanied by an unscalable computational cost, limiting its advancement and weighing on the field in practice. In this thesis we take a systematic approach to address the algorithmic and methodological limitations at the root of these costs. We first demonstrate that DL training and pruning are predictable and governed by scaling laws -- for state of the art models and tasks, spanning image classification and language modeling, as well as for state of the art model compression via iterative pruning. Predictability, via the establishment of these scaling laws, provides the path for principled design and trade-off reasoning, currently largely lacking in the field. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning

MethodsPruning