Variance-reduced Clipping for Non-convex Optimization

Amirhossein Reisizadeh; Haochuan Li; Subhro Das; Ali Jadbabaie

arXiv:2303.00883·cs.LG·June 6, 2023·1 cites

Variance-reduced Clipping for Non-convex Optimization

Amirhossein Reisizadeh, Haochuan Li, Subhro Das, Ali Jadbabaie

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variance-reduced clipping method for non-convex optimization that improves theoretical complexity bounds and demonstrates competitive empirical performance in deep learning tasks.

Contribution

It develops a variance reduction technique combined with gradient clipping under relaxed smoothness assumptions, achieving order-optimal complexity bounds.

Findings

01

Improves stochastic gradient complexity to O(ε^{-3}) using SPIDER.

02

Achieves order-optimal complexity for finite-sum problems with O(√n ε^{-2} + n).

03

Empirically outperforms or matches existing variance-reduced methods in vision tasks.

Abstract

Gradient clipping is a standard training technique used in deep learning applications such as large-scale language modeling to mitigate exploding gradients. Recent experimental studies have demonstrated a fairly special behavior in the smoothness of the training objective along its trajectory when trained with gradient clipping. That is, the smoothness grows with the gradient norm. This is in clear contrast to the well-established assumption in folklore non-convex optimization, a.k.a. $L$ --smoothness, where the smoothness is assumed to be bounded by a constant $L$ globally. The recently introduced $(L_{0}, L_{1})$ --smoothness is a more relaxed notion that captures such behavior in non-convex optimization. In particular, it has been shown that under this relaxed smoothness assumption, SGD with clipping requires $O (ϵ^{- 4})$ stochastic gradient computations to find an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haochuan-mit/varaince-reduced-clipping-for-non-convex-optimization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsStochastic Gradient Descent