Scaling Laws in Linear Regression: Compute, Parameters, and Data

Licong Lin; Jingfeng Wu; Sham M. Kakade; Peter L. Bartlett; Jason D. Lee

arXiv:2406.08466·cs.LG·June 11, 2025·1 cites

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee

PDF

Open Access 1 Video

TL;DR

This paper develops a theoretical framework for understanding neural scaling laws in linear regression models, showing how test error depends on model size and data, and explaining why variance errors diminish due to SGD regularization.

Contribution

It provides a novel theoretical analysis of scaling laws in linear regression with SGD, revealing how implicit regularization affects error components and aligns with empirical neural scaling observations.

Findings

01

Test error scales as M^{-(a-1)} + N^{-(a-1)/a} under certain spectral conditions.

02

Variance error is dominated by implicit regularization effects, reducing its impact.

03

Theory matches empirical neural scaling laws and is supported by numerical simulations.

Abstract

Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, which predict that increasing model size monotonically improves performance. We study the theory of scaling laws in an infinite dimensional linear regression setup. Specifically, we consider a model with $M$ parameters as a linear function of sketched covariates. The model is trained by one-pass stochastic gradient descent (SGD) using $N$ data. Assuming the optimal parameter satisfies a Gaussian prior and the data covariance matrix has a power-law spectrum of degree $a > 1$ , we show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scaling Laws in Linear Regression: Compute, Parameters, and Data· slideslive

Taxonomy

TopicsAdvanced Statistical Methods and Models

MethodsStochastic Gradient Descent · Linear Regression