Scaling Law for Stochastic Gradient Descent in Quadratically   Parameterized Linear Regression

Shihong Ding; Haihan Zhang; Hanzhen Zhao; Cong Fang

arXiv:2502.09106·cs.LG·February 14, 2025

Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression

Shihong Ding, Haihan Zhang, Hanzhen Zhao, Cong Fang

PDF

Open Access

TL;DR

This paper investigates the scaling laws of stochastic gradient descent in quadratically parameterized linear regression, revealing how model performance scales with data and model size, especially considering feature learning effects.

Contribution

It extends theoretical understanding of scaling laws to quadratically parameterized models with feature learning, providing explicit generalization bounds and learning dynamics insights.

Findings

01

SGD convergence rates adapt to ground truth decay rates

02

Explicit separation of generalization curves with and without feature learning

03

Derived information-theoretical lower bounds for the model

Abstract

In machine learning, the scaling law describes how the model performance improves with the model and data size scaling up. From a learning theory perspective, this class of results establishes upper and lower generalization bounds for a specific learning algorithm. Here, the exact algorithm running using a specific model parameterization often offers a crucial implicit regularization effect, leading to good generalization. To characterize the scaling law, previous theoretical studies mainly focus on linear models, whereas, feature learning, a notable process that contributes to the remarkable empirical success of neural networks, is regretfully vacant. This paper studies the scaling law over a linear regression with the model being quadratically parameterized. We consider infinitely dimensional data and slope ground truth, both signals exhibiting certain power-law decay rates. We study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Face and Expression Recognition

MethodsLinear Regression · Stochastic Gradient Descent · Focus