High Probability Analysis for Non-Convex Stochastic Optimization with   Clipping

Shaojie Li; Yong Liu

arXiv:2307.13680·cs.LG·July 26, 2023

High Probability Analysis for Non-Convex Stochastic Optimization with Clipping

Shaojie Li, Yong Liu

PDF

Open Access

TL;DR

This paper provides a high probability theoretical analysis of gradient clipping in non-convex stochastic optimization, addressing heavy-tailed gradient behaviors and deriving bounds for optimization and generalization.

Contribution

It offers the first high probability analysis for stochastic optimization with gradient clipping under weak heavy-tailed assumptions, covering multiple algorithms.

Findings

01

Derived optimization bounds with gradient clipping

02

Established generalization bounds under heavy-tailed gradients

03

Applicable to SGD, momentum, and adaptive methods

Abstract

Gradient clipping is a commonly used technique to stabilize the training process of neural networks. A growing body of studies has shown that gradient clipping is a promising technique for dealing with the heavy-tailed behavior that emerged in stochastic optimization as well. While gradient clipping is significant, its theoretical guarantees are scarce. Most theoretical guarantees only provide an in-expectation analysis and only focus on optimization performance. In this paper, we provide high probability analysis in the non-convex setting and derive the optimization bound and the generalization bound simultaneously for popular stochastic optimization algorithms with gradient clipping, including stochastic gradient descent and its variants of momentum and adaptive stepsizes. With the gradient clipping, we study a heavy-tailed assumption that the gradients only have bounded $α$ -th…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsFocus · Gradient Clipping