On the Algorithmic Stability and Generalization of Adaptive Optimization   Methods

Han Nguyen; Hai Pham; Sashank J. Reddi; Barnab\'as P\'oczos

arXiv:2211.03970·cs.LG·November 9, 2022

On the Algorithmic Stability and Generalization of Adaptive Optimization Methods

Han Nguyen, Hai Pham, Sashank J. Reddi, Barnab\'as P\'oczos

PDF

Open Access

TL;DR

This paper introduces a new theoretical framework to analyze the stability and generalization of popular adaptive optimization algorithms like Adam and RMSProp, revealing how their properties depend on specific parameters.

Contribution

It provides the first provable guarantees for the stability and generalization of adaptive optimizers, highlighting the role of the parameter β₂.

Findings

01

Guarantees depend heavily on the parameter β₂

02

Empirical results support theoretical claims

03

Insights into stability and generalization of adaptive methods

Abstract

Despite their popularity in deep learning and machine learning in general, the theoretical properties of adaptive optimizers such as Adagrad, RMSProp, Adam or AdamW are not yet fully understood. In this paper, we develop a novel framework to study the stability and generalization of these optimization methods. Based on this framework, we show provable guarantees about such properties that depend heavily on a single parameter $β_{2}$ . Our empirical experiments support our claims and provide practical insights into the stability and generalization properties of adaptive optimization methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques

MethodsAdamW · RMSProp · Adam