Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on   Least Squares

Anant Raj; Melih Barsbey; Mert G\"urb\"uzbalaban; Lingjiong Zhu and; Umut \c{S}im\c{s}ekli

arXiv:2206.01274·stat.ML·February 14, 2023·1 cites

Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

Anant Raj, Melih Barsbey, Mert G\"urb\"uzbalaban, Lingjiong Zhu and, Umut \c{S}im\c{s}ekli

PDF

Open Access

TL;DR

This paper investigates how heavy-tailed stochastic gradient descent (SGD) affects generalization, revealing that stability and generalization depend on the tail heaviness and the loss measure, with non-monotonic relationships supported by theory and experiments.

Contribution

The study establishes novel theoretical links between heavy tails in SGD and generalization via algorithmic stability, using a quadratic model and heavy-tailed SDEs, and demonstrates the non-monotonic relation between tail heaviness and generalization.

Findings

01

SGD stability varies with the loss measure used.

02

Heavier tails can improve generalization up to a threshold.

03

Theoretical bounds are tight and supported by experiments.

Abstract

Recent studies have shown that heavy tails can emerge in stochastic optimization and that the heaviness of the tails have links to the generalization error. While these studies have shed light on interesting aspects of the generalization behavior in modern settings, they relied on strong topological and statistical regularity assumptions, which are hard to verify in practice. Furthermore, it has been empirically illustrated that the relation between heavy tails and generalization might not always be monotonic in practice, contrary to the conclusions of existing theory. In this study, we establish novel links between the tail behavior and generalization properties of stochastic gradient descent (SGD), through the lens of algorithmic stability. We consider a quadratic optimization problem and use a heavy-tailed stochastic differential equation (and its Euler discretization) as a proxy for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsStochastic Gradient Descent