Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise
Kwangjun Ahn, Zhiyu Zhang, Yunbum Kook, Yan Dai

TL;DR
This paper offers a new theoretical perspective on the Adam optimizer by framing it as an instance of Follow-the-Regularized-Leader (FTRL) within online learning, revealing the importance of its components.
Contribution
It demonstrates that Adam can be understood as an FTRL algorithm, providing insights into its components through the online learning framework.
Findings
Adam is equivalent to FTRL in online learning.
The online learning perspective clarifies the role of Adam's components.
This understanding could guide the design of improved optimizers.
Abstract
Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games
MethodsAdam · Stochastic Gradient Descent
