UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic   Optimization

Yiming Jiang; Jinlan Liu; Dongpo Xu; Danilo P. Mandic

arXiv:2305.05675·cs.LG·September 24, 2024·1 cites

UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization

Yiming Jiang, Jinlan Liu, Dongpo Xu, Danilo P. Mandic

PDF

Open Access

TL;DR

This paper introduces UAdam, a unified framework for Adam-type algorithms, providing convergence guarantees in non-convex stochastic optimization and encompassing many variants like Adam, NAdam, and AMSGrad.

Contribution

The paper presents a general framework for Adam-type algorithms with convergence analysis, unifying various variants and establishing theoretical guarantees in non-convex stochastic settings.

Findings

01

UAdam converges to stationary point neighborhoods at rate O(1/T).

02

The neighborhood size decreases as the momentum parameter β increases.

03

Vanilla Adam can converge with proper hyperparameters, according to the analysis.

Abstract

Adam-type algorithms have become a preferred choice for optimisation in the deep learning setting, however, despite success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms (called UAdam). This is equipped with a general form of the second-order moment, which makes it possible to include Adam and its variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. This is supported by a rigorous convergence analysis of UAdam in the non-convex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with the rate of $O (1/ T)$ . Furthermore, the size of neighborhood decreases as $β$ increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM

MethodsAdaptive Nesterov Momentum · AdaBound · AMSGrad · Adam