Algorithmic Stability of Stochastic Gradient Descent with Momentum under Heavy-Tailed Noise
Thanh Dang, Melih Barsbey, A K M Rokonuzzaman Sonet, Mert, Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

TL;DR
This paper analyzes how stochastic gradient descent with momentum (SGDm) generalizes under heavy-tailed noise, revealing that momentum can worsen generalization for quadratic losses and providing bounds for both continuous and discrete dynamics.
Contribution
It establishes the first theoretical generalization bounds for SGDm under heavy-tailed noise, including continuous-time SDE analysis and discrete-time discretization error bounds.
Findings
Heavy-tailed noise can harm SGDm generalization for quadratic losses.
Discrete-time bounds show step-sizes can preserve continuous-time properties.
Empirical results on neural networks support theoretical insights.
Abstract
Understanding the generalization properties of optimization algorithms under heavy-tailed noise has gained growing attention. However, the existing theoretical results mainly focus on stochastic gradient descent (SGD) and the analysis of heavy-tailed optimizers beyond SGD is still missing. In this work, we establish generalization bounds for SGD with momentum (SGDm) under heavy-tailed gradient noise. We first consider the continuous-time limit of SGDm, i.e., a Levy-driven stochastic differential equation (SDE), and establish quantitative Wasserstein algorithmic stability bounds for a class of potentially non-convex loss functions. Our bounds reveal a remarkable observation: For quadratic loss functions, we show that SGDm admits a worse generalization bound in the presence of heavy-tailed noise, indicating that the interaction of momentum and heavy tails can be harmful for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications
MethodsSGD with Momentum · Stochastic Gradient Descent · Focus
