Toward a Unified Theory of Gradient Descent under Generalized Smoothness

Alexander Tyurin

arXiv:2412.11773·math.OC·June 30, 2025·ICML

Toward a Unified Theory of Gradient Descent under Generalized Smoothness

Alexander Tyurin

PDF

Open Access

TL;DR

This paper extends the analysis of gradient descent to functions with generalized smoothness, deriving adaptive step sizes and improving convergence guarantees in both convex and nonconvex optimization.

Contribution

It introduces a unified framework for gradient descent under generalized smoothness, deriving new step size rules and convergence results beyond classical L-smoothness assumptions.

Findings

01

Derived adaptive step size based on generalized smoothness.

02

Improved convergence rates over existing methods.

03

Extended analysis to new optimization settings.

Abstract

We study the classical optimization problem $min_{x \in R^{d}} f (x)$ and analyze the gradient descent (GD) method in both nonconvex and convex settings. It is well-known that, under the $L$ -smoothness assumption ( $∥ \nabla^{2} f (x) ∥ \leq L$ ), the optimal point minimizing the quadratic upper bound $f (x_{k}) + ⟨ \nabla f (x_{k}), x_{k + 1} - x_{k} ⟩ + \frac{L}{2} ∥ x_{k + 1} - x_{k} ∥^{2}$ is $x_{k + 1} = x_{k} - γ_{k} \nabla f (x_{k})$ with step size $γ_{k} = \frac{1}{L}$ . Surprisingly, a similar result can be derived under the $ℓ$ -generalized smoothness assumption ( $∥ \nabla^{2} f (x) ∥ \leq ℓ (∥\nabla f (x) ∥)$ ). In this case, we derive the step size $γ_{k} = \int_{0}^{1} \frac{d v}{ℓ ( ∥\nabla f ( x _{k} ) ∥ + ∥\nabla f ( x _{k} ) ∥ v )} .$ Using this step size rule, we improve upon existing theoretical convergence rates and obtain new results in several previously unexplored setups.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Banach Space Theory · Optimization and Variational Analysis