Understanding Adam Optimizer via Online Learning of Updates: Adam is   FTRL in Disguise

Kwangjun Ahn; Zhiyu Zhang; Yunbum Kook; Yan Dai

arXiv:2402.01567·cs.LG·May 31, 2024·2 cites

Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

Kwangjun Ahn, Zhiyu Zhang, Yunbum Kook, Yan Dai

PDF

Open Access

TL;DR

This paper offers a new theoretical perspective on the Adam optimizer by framing it as an instance of Follow-the-Regularized-Leader (FTRL) within online learning, revealing the importance of its components.

Contribution

It demonstrates that Adam can be understood as an FTRL algorithm, providing insights into its components through the online learning framework.

Findings

01

Adam is equivalent to FTRL in online learning.

02

The online learning perspective clarifies the role of Adam's components.

03

This understanding could guide the design of improved optimizers.

Abstract

Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games

MethodsAdam · Stochastic Gradient Descent