Nostalgic Adam: Weighting more of the past gradients when designing the   adaptive learning rate

Haiwen Huang; Chang Wang; Bin Dong

arXiv:1805.07557·cs.LG·December 1, 2020

Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate

Haiwen Huang, Chang Wang, Bin Dong

PDF

2 Repos

TL;DR

Nostalgic Adam (NosAdam) introduces a weighting scheme for past gradients to improve the convergence and performance of adaptive optimization algorithms like Adam in deep learning.

Contribution

The paper proposes NosAdam, an adaptive optimizer that incorporates long-term memory of past gradients, with proven convergence guarantees and improved performance over Adam.

Findings

01

NosAdam guarantees convergence at the best known rate.

02

Preliminary experiments show NosAdam outperforms Adam.

03

NosAdam addresses non-convergence issues of Adam.

Abstract

First-order optimization algorithms have been proven prominent in deep learning. In particular, algorithms such as RMSProp and Adam are extremely popular. However, recent works have pointed out the lack of ``long-term memory" in Adam-like algorithms, which could hamper their performance and lead to divergence. In our study, we observe that there are benefits of weighting more of the past gradients when designing the adaptive learning rate. We therefore propose an algorithm called the Nostalgic Adam (NosAdam) with theoretically guaranteed convergence at the best known convergence rate. NosAdam can be regarded as a fix to the non-convergence issue of Adam in alternative to the recent work of [Reddi et al., 2018]. Our preliminary numerical experiments show that NosAdam is a promising alternative algorithm to Adam. The proofs, code and other supplementary materials can be found in an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRMSProp · Adam