ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal   Rate

Shohei Taniguchi; Keno Harada; Gouki Minegishi; Yuta Oshima; Seong; Cheol Jeong; Go Nagahara; Tomoshi Iiyama; Masahiro Suzuki; Yusuke Iwasawa,; Yutaka Matsuo

arXiv:2411.02853·cs.LG·November 25, 2024

ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate

Shohei Taniguchi, Keno Harada, Gouki Minegishi, Yuta Oshima, Seong, Cheol Jeong, Go Nagahara, Tomoshi Iiyama, Masahiro Suzuki, Yusuke Iwasawa,, Yutaka Matsuo

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

ADOPT is a new adaptive gradient method that guarantees convergence with any 2 choice, achieving optimal rates without the impractical bounded noise assumption, and outperforms Adam in various deep learning tasks.

Contribution

ADOPT introduces a novel modification to Adam that ensures convergence with any 2, removing the need for problem-dependent hyperparameter tuning and bounded noise assumptions.

Findings

01

ADOPT converges at the optimal rate 0. in theory.

02

ADOPT outperforms Adam and variants across multiple tasks.

03

ADOPT is robust to any 2 choice, simplifying hyperparameter tuning.

Abstract

Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $β_{2}$ , in a problem-dependent manner. There have been many attempts to fix the non-convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of $O (1/ T)$ with any choice of $β_{2}$ without depending on the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum update and the normalization by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
model· 40 dl
40 dl

Videos

ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate· slideslive

Taxonomy

TopicsComputability, Logic, AI Algorithms · Constraint Satisfaction and Optimization

MethodsADaptive gradient method with the OPTimal convergence rate · Adam