Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs
Yi Liu

TL;DR
This paper investigates optimal model sizing under fixed wall-clock time constraints on consumer GPUs, revealing a dual U-shape behavior and a model size scaling law that exceeds compute-optimal predictions.
Contribution
It introduces a new understanding of model scaling laws under wall-clock time constraints, emphasizing the dual U-shape mechanism and faster-than-expected model size growth.
Findings
Optimal model size scales as t^{0.60}, faster than traditional compute-based laws.
A dual U-shape mechanism explains overfitting and undertraining at different time regimes.
Code, logs, and configurations are publicly released for reproducibility.
Abstract
Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minutes to 24 hours on consumer GPUs (RTX 4090). Across 70+ runs spanning 50M--1031M parameters, we find: (1)~at each time budget a U-shaped curve emerges where too-small models overfit and too-large models undertrain; (2)~optimal model size follows , growing \emph{faster} than Chinchilla's , with robustly exceeding compute-optimal across all sensitivity analyses; (3)~a \emph{dual U-shape mechanism}: short-budget U-curves arise from compute bottlenecks, while long-budget U-curves emerge from data bottlenecks (overfitting), with an intermediate regime where the U-curve temporarily disappears. These findings have immediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
