AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

Wenjie Li; Zhaoyang Zhang; Xinjiang Wang; Ping Luo

arXiv:2004.09740·cs.LG·May 6, 2020·24 cites

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

Wenjie Li, Zhaoyang Zhang, Xinjiang Wang, Ping Luo

PDF

Open Access 1 Repo

TL;DR

AdaX is a new adaptive gradient descent algorithm that improves upon Adam by exponentially accumulating long-term gradient information, leading to better convergence and performance in machine learning tasks.

Contribution

The paper introduces AdaX, a novel optimizer that addresses Adam's limitations by incorporating long-term gradient memory, with proven convergence and superior empirical results.

Findings

01

AdaX outperforms Adam in vision and NLP tasks.

02

AdaX converges faster and more reliably than Adam.

03

AdaX matches the performance of SGD in various tasks.

Abstract

Although adaptive optimization algorithms such as Adam show fast convergence in many machine learning tasks, this paper identifies a problem of Adam by analyzing its performance in a simple non-convex synthetic problem, showing that Adam's fast convergence would possibly lead the algorithm to local minimums. To address this problem, we improve Adam by proposing a novel adaptive gradient descent algorithm named AdaX. Unlike Adam that ignores the past gradients, AdaX exponentially accumulates the long-term gradient information in the past during training, to adaptively tune the learning rate. We thoroughly prove the convergence of AdaX in both the convex and non-convex settings. Extensive experiments show that AdaX outperforms Adam in various tasks of computer vision and natural language processing and can catch up with Stochastic Gradient Descent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

switchablenorms/AdaX
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques

MethodsAdam