A Theoretical and Empirical Study on the Convergence of Adam with an   "Exact" Constant Step Size in Non-Convex Settings

Alokendu Mazumder; Rishabh Sabharwal; Manan Tayal; Bhartendu Kumar,; Punit Rathore

arXiv:2309.08339·cs.LG·April 5, 2024

A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings

Alokendu Mazumder, Rishabh Sabharwal, Manan Tayal, Bhartendu Kumar,, Punit Rathore

PDF

Open Access

TL;DR

This paper provides a theoretical analysis and empirical validation of using a fixed constant step size in Adam optimizer for non-convex neural network training, demonstrating guaranteed convergence and improved gradient reduction.

Contribution

It introduces a theoretically grounded constant step size for Adam, proving convergence guarantees and comparing its performance against adaptive schedulers in practice.

Findings

01

Guaranteed convergence of Adam with constant step size in non-convex settings

02

Constant step size improves gradient norm reduction

03

Empirical results favor fixed step size over adaptive schedules

Abstract

In neural network training, RMSProp and Adam remain widely favoured optimisation algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyse a constant step size version of Adam in the non-convex setting and discuss why it is important for the convergence of Adam to use a fixed step size. This work demonstrates the derivation and effective implementation of a constant step size for Adam, offering insights into its performance and efficiency in non convex optimisation scenarios. (i) First, we provide proof that these adaptive gradient algorithms are guaranteed to reach criticality for smooth non-convex objectives with constant step size, and we give bounds…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Machine Learning and Algorithms · Neuroscience and Neural Engineering

MethodsNetwork On Network · Adam · RMSProp