The Implicit Regularization of Dynamical Stability in Stochastic   Gradient Descent

Lei Wu; Weijie J. Su

arXiv:2305.17490·stat.ML·June 2, 2023·1 cites

The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

Lei Wu, Weijie J. Su

PDF

Open Access

TL;DR

This paper investigates how stochastic gradient descent (SGD) implicitly regularizes model complexity through dynamical stability, leading to better generalization compared to gradient descent (GD), especially influenced by the learning rate.

Contribution

It establishes a theoretical link between stability metrics and generalization in SGD, contrasting it with GD, and highlights the role of learning rate in regularization strength.

Findings

01

Stable minima of SGD generalize well due to sharpness and norm equivalence.

02

GD's stability is too weak for effective regularization.

03

Larger learning rates enhance SGD's regularization effect.

Abstract

In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of {\em dynamical stability} (Wu et al., 2018). We start by revising existing stability analyses of SGD, showing how the Frobenius norm and trace of Hessian relate to different notions of stability. Notably, if a global minimum is linearly stable for SGD, then the trace of Hessian must be less than or equal to $2/ η$ , where $η$ denotes the learning rate. By contrast, for gradient descent (GD), the stability imposes a similar constraint but only on the largest eigenvalue of Hessian. We then turn to analyze the generalization properties of these stable minima, focusing specifically on two-layer ReLU networks and diagonal linear networks. Notably, we establish the {\em equivalence} between these metrics of sharpness and certain parameter norms for the two models, which allows us…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFunctional Brain Connectivity Studies · Advanced Fluorescence Microscopy Techniques · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent