High Probability Analysis for Non-Convex Stochastic Optimization with Clipping
Shaojie Li, Yong Liu

TL;DR
This paper provides a high probability theoretical analysis of gradient clipping in non-convex stochastic optimization, addressing heavy-tailed gradient behaviors and deriving bounds for optimization and generalization.
Contribution
It offers the first high probability analysis for stochastic optimization with gradient clipping under weak heavy-tailed assumptions, covering multiple algorithms.
Findings
Derived optimization bounds with gradient clipping
Established generalization bounds under heavy-tailed gradients
Applicable to SGD, momentum, and adaptive methods
Abstract
Gradient clipping is a commonly used technique to stabilize the training process of neural networks. A growing body of studies has shown that gradient clipping is a promising technique for dealing with the heavy-tailed behavior that emerged in stochastic optimization as well. While gradient clipping is significant, its theoretical guarantees are scarce. Most theoretical guarantees only provide an in-expectation analysis and only focus on optimization performance. In this paper, we provide high probability analysis in the non-convex setting and derive the optimization bound and the generalization bound simultaneously for popular stochastic optimization algorithms with gradient clipping, including stochastic gradient descent and its variants of momentum and adaptive stepsizes. With the gradient clipping, we study a heavy-tailed assumption that the gradients only have bounded -th…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsFocus · Gradient Clipping
