The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali, Edgar Dobriban, and Ryan J. Tibshirani

TL;DR
This paper investigates how mini-batch stochastic gradient descent implicitly regularizes least squares regression by analyzing a stochastic gradient flow model, providing bounds on excess risk and insights into the effects of algorithm parameters.
Contribution
It introduces a stochastic gradient flow framework for analyzing implicit regularization in least squares, deriving bounds that relate algorithm parameters to excess risk without data restrictions.
Findings
Bound on excess risk of stochastic gradient flow compared to ridge regression.
Explicit relationship between mini-batch size, step size, and regularization effect.
Numerical evidence showing the bound's tightness and the relationship between coefficients.
Abstract
We study the implicit regularization of mini-batch stochastic gradient descent, when applied to the fundamental problem of least squares regression. We leverage a continuous-time stochastic differential equation having the same moments as stochastic gradient descent, which we call stochastic gradient flow. We give a bound on the excess risk of stochastic gradient flow at time , over ridge regression with tuning parameter . The bound may be computed from explicit constants (e.g., the mini-batch size, step size, number of iterations), revealing precisely how these quantities drive the excess risk. Numerical examples show the bound can be small, indicating a tight relationship between the two estimators. We give a similar result relating the coefficients of stochastic gradient flow and ridge. These results hold under no conditions on the data matrix , and across the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
