Time Matters: Scaling Laws for Any Budget
Itay Inbar, Luke Sernau

TL;DR
This paper introduces a new method to accurately predict large model training times and final loss using memory copy proxies, leading to insights that favor wider models over deeper ones for efficiency.
Contribution
It develops a memory copy-based proxy for training speed estimation and combines it with scaling laws to predict model loss, enabling more efficient architectural decisions.
Findings
Memory copy proxy outperforms FLOPs for training time estimation.
Wider models are more efficient than deeper ones according to the analysis.
Accurate prediction of final loss from hyperparameters across various settings.
Abstract
A primary cost driver for training large models is wall-clock training time. We show that popular time estimates based on FLOPs are poor estimates, and construct a more accurate proxy based on memory copies. This allows us to accurately estimate the training speed of a transformer model from its hyperparameters. Combined with a scaling law curve like Chinchilla, this allows us to accurately predict the final loss of a model from a simple equation. We show that this expression is accurate across a wide range of model hyperparameter values, enabling us to analytically make architectural decisions and train models more efficiently. Crucially, this analysis predicts that in contrast to existing literature, models should be wider rather than deeper, as the benefits of speed outweigh the benefits of depth.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGerman Economic Analysis & Policies · Housing, Finance, and Neoliberalism · Gender, Labor, and Family Dynamics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Chinchilla
