Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

TL;DR
This paper provides a detailed analysis of SGD dynamics in high-dimensional convex quadratic problems, revealing that its efficiency stems from implicit conditioning rather than regularization, and offers explicit formulas for risk trajectories.
Contribution
It introduces the homogenized stochastic gradient descent (HSGD) model, characterizes its solutions, and explains the implicit conditioning mechanism behind SGD's efficiency, while ruling out implicit regularization effects.
Findings
SGD's efficiency is due to implicit conditioning, not regularization.
Explicit formulas for learning and risk trajectories are derived.
Noise in SGD negatively impacts generalization performance.
Abstract
Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Single-cell and spatial transcriptomics
MethodsStochastic Gradient Descent
