Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression
Shihong Ding, Haihan Zhang, Hanzhen Zhao, Cong Fang

TL;DR
This paper investigates the scaling laws of stochastic gradient descent in quadratically parameterized linear regression, revealing how model performance scales with data and model size, especially considering feature learning effects.
Contribution
It extends theoretical understanding of scaling laws to quadratically parameterized models with feature learning, providing explicit generalization bounds and learning dynamics insights.
Findings
SGD convergence rates adapt to ground truth decay rates
Explicit separation of generalization curves with and without feature learning
Derived information-theoretical lower bounds for the model
Abstract
In machine learning, the scaling law describes how the model performance improves with the model and data size scaling up. From a learning theory perspective, this class of results establishes upper and lower generalization bounds for a specific learning algorithm. Here, the exact algorithm running using a specific model parameterization often offers a crucial implicit regularization effect, leading to good generalization. To characterize the scaling law, previous theoretical studies mainly focus on linear models, whereas, feature learning, a notable process that contributes to the remarkable empirical success of neural networks, is regretfully vacant. This paper studies the scaling law over a linear regression with the model being quadratically parameterized. We consider infinitely dimensional data and slope ground truth, both signals exhibiting certain power-law decay rates. We study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Face and Expression Recognition
MethodsLinear Regression · Stochastic Gradient Descent · Focus
