High Probability Convergence of Adam Under Unbounded Gradients and   Affine Variance Noise

Yusu Hong; Junhong Lin

arXiv:2311.02000·math.OC·November 6, 2023·1 cites

High Probability Convergence of Adam Under Unbounded Gradients and Affine Variance Noise

Yusu Hong, Junhong Lin

PDF

Open Access

TL;DR

This paper proves that Adam algorithm converges with high probability in non-convex stochastic optimization under affine variance noise without requiring bounded gradients, providing theoretical guarantees for practical scenarios.

Contribution

It establishes high probability convergence of Adam under affine variance noise without strong assumptions, expanding its theoretical understanding in real-world applications.

Findings

01

Adam converges to stationary points with high probability at a rate of O(poly(log T)/√T).

02

Adam's gradients are confined within an order of O(poly(log T)).

03

A simplified Adam variant also achieves adaptive convergence based on noise level.

Abstract

In this paper, we study the convergence of the Adaptive Moment Estimation (Adam) algorithm under unconstrained non-convex smooth stochastic optimizations. Despite the widespread usage in machine learning areas, its theoretical properties remain limited. Prior researches primarily investigated Adam's convergence from an expectation view, often necessitating strong assumptions like uniformly stochastic bounded gradients or problem-dependent knowledge in prior. As a result, the applicability of these findings in practical real-world scenarios has been constrained. To overcome these limitations, we provide a deep analysis and show that Adam could converge to the stationary point in high probability with a rate of $O (poly (lo g T) / T)$ under coordinate-wise "affine" variance noise, not requiring any bounded gradient assumption and any problem-dependent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM

MethodsAdam