Improved Scaling Laws in Linear Regression via Data Reuse

Licong Lin; Jingfeng Wu; Peter L. Bartlett

arXiv:2506.08415·cs.LG·September 26, 2025

Improved Scaling Laws in Linear Regression via Data Reuse

Licong Lin, Jingfeng Wu, Peter L. Bartlett

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that data reuse in linear regression with multi-pass SGD can improve test error bounds and scaling laws, especially when data is limited, by leveraging spectral properties of data covariance and prior distributions.

Contribution

The paper derives sharp test error bounds showing how data reuse via multi-pass SGD improves scaling laws in linear regression under spectral assumptions.

Findings

01

Multi-pass SGD achieves lower test error bounds than one-pass SGD.

02

Data reuse improves scaling laws in data-constrained regimes.

03

Numerical simulations verify theoretical predictions.

Abstract

Neural scaling laws suggest that the test error of large language models trained online decreases polynomially as the model size and data size increase. However, such scaling can be unsustainable when running out of new data. In this work, we show that data reuse can improve existing scaling laws in linear regression. Specifically, we derive sharp test error bounds on $M$ -dimensional linear models trained by multi-pass stochastic gradient descent (multi-pass SGD) on $N$ data with sketched features. Assuming that the data covariance has a power-law spectrum of degree $a$ , and that the true parameter follows a prior with an aligned power-law spectrum of degree $b - a$ (with $a > b > 1$ ), we show that multi-pass SGD achieves a test error of $Θ (M^{1 - b} + L^{(1 - b) / a})$ , where $L ≲ N^{a / b}$ is the number of iterations. In the same setting, one-pass SGD only attains a test error of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Scaling Laws in Linear Regression via Data Reuse· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference

MethodsStochastic Gradient Descent