Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Ofir Gaash; Kfir Yehuda Levy; Yair Carmon

arXiv:2502.16492·math.OC·June 4, 2025

Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions

Ofir Gaash, Kfir Yehuda Levy, Yair Carmon

PDF

Open Access

TL;DR

This paper analyzes the convergence of clipped stochastic gradient descent on convex functions with a generalized smoothness condition, providing theoretical guarantees and empirical validation for the proposed methods.

Contribution

It introduces convergence analysis for clipped SGD under $(L_0,L_1)$-smoothness and proposes an adaptive variant with matching guarantees.

Findings

01

High probability convergence rate similar to standard SGD

02

Effective adaptive SGD variant with gradient clipping

03

Empirical results support theoretical claims

Abstract

We study stochastic gradient descent (SGD) with gradient clipping on convex functions under a generalized smoothness assumption called $(L_{0}, L_{1})$ -smoothness. Using gradient clipping, we establish a high probability convergence rate that matches the SGD rate in the $L$ smooth case up to polylogarithmic factors and additive terms. We also propose a variation of adaptive SGD with gradient clipping, which achieves the same guarantee. We perform empirical experiments to examine our theory and algorithmic choices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Advanced Banach Space Theory · Advanced Topology and Set Theory