A Theoretical and Empirical Study on the Convergence of Adam with an "Exact" Constant Step Size in Non-Convex Settings
Alokendu Mazumder, Rishabh Sabharwal, Manan Tayal, Bhartendu Kumar,, Punit Rathore

TL;DR
This paper provides a theoretical analysis and empirical validation of using a fixed constant step size in Adam optimizer for non-convex neural network training, demonstrating guaranteed convergence and improved gradient reduction.
Contribution
It introduces a theoretically grounded constant step size for Adam, proving convergence guarantees and comparing its performance against adaptive schedulers in practice.
Findings
Guaranteed convergence of Adam with constant step size in non-convex settings
Constant step size improves gradient norm reduction
Empirical results favor fixed step size over adaptive schedules
Abstract
In neural network training, RMSProp and Adam remain widely favoured optimisation algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyse a constant step size version of Adam in the non-convex setting and discuss why it is important for the convergence of Adam to use a fixed step size. This work demonstrates the derivation and effective implementation of a constant step size for Adam, offering insights into its performance and efficiency in non convex optimisation scenarios. (i) First, we provide proof that these adaptive gradient algorithms are guaranteed to reach criticality for smooth non-convex objectives with constant step size, and we give bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Machine Learning and Algorithms · Neuroscience and Neural Engineering
MethodsNetwork On Network · Adam · RMSProp
