Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

Yufeng Yang; Erin Tripp; Yifan Sun; Shaofeng Zou; Yi Zhou

arXiv:2410.14054·math.OC·October 3, 2025·Trans. Mach. Learn. Res.

Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization

Yufeng Yang, Erin Tripp, Yifan Sun, Shaofeng Zou, Yi Zhou

PDF

Open Access

TL;DR

This paper introduces an adaptive gradient normalization technique and an independent sampling method for generalized-smooth nonconvex optimization, providing theoretical convergence guarantees and demonstrating fast empirical results.

Contribution

It develops a new adaptive normalization approach and an independent sampling algorithm tailored for generalized-smooth nonconvex problems, with proven convergence and improved sample complexity.

Findings

01

The adaptive normalization improves convergence analysis in generalized-smooth settings.

02

The proposed stochastic algorithm achieves an $ ext{O}( ext{}\epsilon^{-4})$ sample complexity.

03

Experiments show rapid convergence on large-scale nonconvex problems.

Abstract

Recent studies have shown that many nonconvex machine learning problems satisfy a generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms are not fully adapted to such generalized-smooth nonconvex geometry and encounter significant technical limitations on their convergence analysis. In this work, we first analyze the convergence of adaptively normalized gradient descent under function geometries characterized by generalized-smoothness and generalized P{\L} condition, revealing the advantage of adaptive gradient normalization. Our results provide theoretical insights into adaptive normalization across various scenarios.For stochastic generalized-smooth nonconvex optimization, we propose \textbf{I}ndependent-\textbf{A}daptively \textbf{N}ormalized \textbf{S}tochastic \textbf{G}radient \textbf{D}escent algorithm, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Variational Analysis · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques

MethodsGradient Clipping · Gradient Normalization