Data movement limits to frontier model training

Ege Erdil; David Schneider-Joseph

arXiv:2411.01137·cs.DC·November 14, 2024

Data movement limits to frontier model training

Ege Erdil, David Schneider-Joseph

PDF

Open Access

TL;DR

This paper presents a theoretical analysis of data movement bottlenecks in large-scale distributed training, highlighting fundamental limits to scaling models beyond certain computational thresholds within three years.

Contribution

It introduces a model to analyze how data movement constraints impact the scalability of dense and sparse training runs, identifying key thresholds and potential strategies for larger-scale training.

Findings

01

Data movement bottlenecks significantly reduce hardware utilization beyond 10^28 FLOP.

02

Training runs exceeding 10^31 FLOP are infeasible due to data movement limits.

03

Aggressive batch scaling and model shape adjustments could enable larger training runs.

Abstract

We present a theoretical model of distributed training, and use it to analyze how far dense and sparse training runs can be scaled. Under our baseline assumptions, given a three month training duration, data movement bottlenecks begin to significantly lower hardware utilization for training runs exceeding about $1 0^{28}$ FLOP, two orders of magnitude above the largest training run to date, suggesting the arrival of fundamental barriers to scaling in three years given recent rates of growth. A training run exceeding about $1 0^{31}$ FLOP is infeasible even at low utilization. However, more aggressive batch size scaling and/or shorter and fatter model shapes, if achievable, have the potential to permit much larger training runs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification