On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares
Stefan Steinerberger

TL;DR
This paper analyzes how stochastic gradient descent (SGD) applied to least squares problems exhibits a regularization effect, especially when the residual aligns with large singular vectors, leading to smoothing of solutions.
Contribution
The paper provides an explicit inequality showing the regularization effect of SGD on least squares problems, with extensions to symmetric matrices and Sobolev spaces.
Findings
SGD induces a regularization effect depending on the residual's singular vector composition.
The inequality reveals a smoothing energy cascade from large to small singular values.
Extensions to symmetric matrices demonstrate higher-order Sobolev space regularization.
Abstract
We study the behavior of stochastic gradient descent applied to for invertible . We show that there is an explicit constant depending (mildly) on such that This is a curious inequality: the last term has one more matrix applied to the residual than the remaining terms: if is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
