Implicit Regularization or Implicit Conditioning? Exact Risk   Trajectories of SGD in High Dimensions

Courtney Paquette; Elliot Paquette; Ben Adlam; Jeffrey Pennington

arXiv:2206.07252·stat.ML·June 16, 2022

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

PDF

Open Access 1 Video

TL;DR

This paper provides a detailed analysis of SGD dynamics in high-dimensional convex quadratic problems, revealing that its efficiency stems from implicit conditioning rather than regularization, and offers explicit formulas for risk trajectories.

Contribution

It introduces the homogenized stochastic gradient descent (HSGD) model, characterizes its solutions, and explains the implicit conditioning mechanism behind SGD's efficiency, while ruling out implicit regularization effects.

Findings

01

SGD's efficiency is due to implicit conditioning, not regularization.

02

Explicit formulas for learning and risk trajectories are derived.

03

Noise in SGD negatively impacts generalization performance.

Abstract

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Single-cell and spatial transcriptomics

MethodsStochastic Gradient Descent