TL;DR
This paper uncovers new dynamical scaling laws that describe how learning performance evolves during training across various neural network architectures and datasets, linking implicit bias to neural scaling laws.
Contribution
It introduces two novel dynamical scaling laws governing training dynamics, extending understanding beyond asymptotic behavior and connecting implicit bias to performance evolution.
Findings
Identified two dynamical scaling laws for training performance.
These laws unify with known scaling laws at convergence.
Results are consistent across multiple architectures and datasets.
Abstract
Scaling laws in deep learning -- empirical power-law relationships linking model performance to resource growth -- have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training. In this work, we describe a richer picture by analyzing the entire training dynamics: we identify two novel \textit{dynamical} scaling laws that govern how performance evolves as function of different norm-based complexity measures. Combined, our new laws recover the well-known scaling for test error at convergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
