# On the Convergence Proof of AMSGrad and a New Version

**Authors:** Tran Thi Phuong, Le Trieu Phong

arXiv: 1904.03590 · 2019-11-01

## TL;DR

This paper critically examines the convergence proofs of AMSGrad, identifies issues with hyper-parameter handling, and proposes fixes including a new version called AdamX, supported by theoretical analysis and experiments.

## Contribution

It reveals flaws in the convergence proof of AMSGrad, provides a corrected proof, and introduces a new variant AdamX with empirical validation.

## Key findings

- Convergence proof of AMSGrad is flawed due to hyper-parameter handling.
- A corrected convergence proof for AMSGrad is provided.
- The new AdamX algorithm outperforms previous variants on benchmark datasets.

## Abstract

The adaptive moment estimation algorithm Adam (Kingma and Ba) is a popular optimizer in the training of deep neural networks. However, Reddi et al. have recently shown that the convergence proof of Adam is problematic and proposed a variant of Adam called AMSGrad as a fix. In this paper, we show that the convergence proof of AMSGrad is also problematic. Concretely, the problem in the convergence proof of AMSGrad is in handling the hyper-parameters, treating them as equal while they are not. This is also the neglected issue in the convergence proof of Adam. We provide an explicit counter-example of a simple convex optimization setting to show this neglected issue. Depending on manipulating the hyper-parameters, we present various fixes for this issue. We provide a new convergence proof for AMSGrad as the first fix. We also propose a new version of AMSGrad called AdamX as another fix. Our experiments on the benchmark dataset also support our theoretical results.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.03590/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1904.03590/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/1904.03590/full.md

---
Source: https://tomesphere.com/paper/1904.03590