U-Clip: On-Average Unbiased Stochastic Gradient Clipping

Bryn Elesedy; Marcus Hutter

arXiv:2302.02971·cs.LG·February 7, 2023

U-Clip: On-Average Unbiased Stochastic Gradient Clipping

Bryn Elesedy, Marcus Hutter

PDF

Open Access

TL;DR

U-Clip introduces a novel gradient clipping method that maintains an unbiased estimate of the true gradient by buffering and adding back clipped portions, ensuring convergence in optimization algorithms.

Contribution

It proposes U-Clip, a simple modification to gradient clipping that preserves unbiasedness and improves convergence guarantees in iterative optimization.

Findings

01

Unbiased gradient estimates achieved with U-Clip.

02

Effective on CIFAR10 and ImageNet datasets.

03

Convergence guarantees demonstrated theoretically.

Abstract

U-Clip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the U-Clip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates $u_{i}$ as long as $\sum_{i = 1}^{t} (u_{i} - g_{i}) = o (t)$ where $g_{i}$ are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning

MethodsGradient Clipping