Improved Scaling Laws in Linear Regression via Data Reuse
Licong Lin, Jingfeng Wu, Peter L. Bartlett

TL;DR
This paper demonstrates that data reuse in linear regression with multi-pass SGD can improve test error bounds and scaling laws, especially when data is limited, by leveraging spectral properties of data covariance and prior distributions.
Contribution
The paper derives sharp test error bounds showing how data reuse via multi-pass SGD improves scaling laws in linear regression under spectral assumptions.
Findings
Multi-pass SGD achieves lower test error bounds than one-pass SGD.
Data reuse improves scaling laws in data-constrained regimes.
Numerical simulations verify theoretical predictions.
Abstract
Neural scaling laws suggest that the test error of large language models trained online decreases polynomially as the model size and data size increase. However, such scaling can be unsustainable when running out of new data. In this work, we show that data reuse can improve existing scaling laws in linear regression. Specifically, we derive sharp test error bounds on -dimensional linear models trained by multi-pass stochastic gradient descent (multi-pass SGD) on data with sketched features. Assuming that the data covariance has a power-law spectrum of degree , and that the true parameter follows a prior with an aligned power-law spectrum of degree (with ), we show that multi-pass SGD achieves a test error of , where is the number of iterations. In the same setting, one-pass SGD only attains a test error of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference
MethodsStochastic Gradient Descent
