On the Algorithmic Stability and Generalization of Adaptive Optimization Methods
Han Nguyen, Hai Pham, Sashank J. Reddi, Barnab\'as P\'oczos

TL;DR
This paper introduces a new theoretical framework to analyze the stability and generalization of popular adaptive optimization algorithms like Adam and RMSProp, revealing how their properties depend on specific parameters.
Contribution
It provides the first provable guarantees for the stability and generalization of adaptive optimizers, highlighting the role of the parameter β₂.
Findings
Guarantees depend heavily on the parameter β₂
Empirical results support theoretical claims
Insights into stability and generalization of adaptive methods
Abstract
Despite their popularity in deep learning and machine learning in general, the theoretical properties of adaptive optimizers such as Adagrad, RMSProp, Adam or AdamW are not yet fully understood. In this paper, we develop a novel framework to study the stability and generalization of these optimization methods. Based on this framework, we show provable guarantees about such properties that depend heavily on a single parameter . Our empirical experiments support our claims and provide practical insights into the stability and generalization properties of adaptive optimization methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
MethodsAdamW · RMSProp · Adam
