Stability and Generalization of Nonconvex Optimization with Heavy-Tailed Noise
Hongxu Chen, Ke Wei, Xiaoming Yuan, Luo Luo

TL;DR
This paper develops a framework to analyze the generalization bounds of stochastic optimization algorithms under heavy-tailed gradient noise, extending understanding beyond convergence to model generalization.
Contribution
It introduces a truncation-based approach for deriving generalization bounds under heavy-tailed noise and analyzes popular algorithms like clipped and normalized SGD.
Findings
Established generalization bounds using stability with heavy-tailed noise
Analyzed stability of clipped and normalized stochastic gradient descent
Provided insights into the effects of heavy-tailed noise on algorithm performance
Abstract
The empirical evidence indicates that stochastic optimization with heavy-tailed gradient noise is more appropriate to characterize the training of machine learning models than that with standard bounded gradient variance noise. Most existing works on this phenomenon focus on the convergence of optimization errors, while the analysis for generalization bounds under the heavy-tailed gradient noise remains limited. In this paper, we develop a general framework for establishing generalization bounds under heavy-tailed noise. Specifically, we introduce a truncation argument to achieve the generalization error bound based on the algorithmic stability under the assumption of bounded th centered moment with . Building on this framework, we further provide the stability and generalization analysis for several popular stochastic algorithms under heavy-tailed noise, including clipped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
