Toward a Unified Theory of Gradient Descent under Generalized Smoothness
Alexander Tyurin

TL;DR
This paper extends the analysis of gradient descent to functions with generalized smoothness, deriving adaptive step sizes and improving convergence guarantees in both convex and nonconvex optimization.
Contribution
It introduces a unified framework for gradient descent under generalized smoothness, deriving new step size rules and convergence results beyond classical L-smoothness assumptions.
Findings
Derived adaptive step size based on generalized smoothness.
Improved convergence rates over existing methods.
Extended analysis to new optimization settings.
Abstract
We study the classical optimization problem and analyze the gradient descent (GD) method in both nonconvex and convex settings. It is well-known that, under the -smoothness assumption (), the optimal point minimizing the quadratic upper bound is with step size . Surprisingly, a similar result can be derived under the -generalized smoothness assumption (). In this case, we derive the step size Using this step size rule, we improve upon existing theoretical convergence rates and obtain new results in several previously unexplored setups.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Banach Space Theory · Optimization and Variational Analysis
