An improvement of the convergence proof of the ADAM-Optimizer
Sebastian Bock, Josef Goppold, Martin Wei{\ss}

TL;DR
This paper identifies errors in the original convergence proof of the ADAM-Optimizer and provides an improved proof to ensure its correctness for neural network training.
Contribution
The paper offers a corrected and improved convergence proof for the widely used ADAM-Optimizer, addressing previous inaccuracies.
Findings
Corrected convergence proof for ADAM-Optimizer
Enhanced theoretical understanding of optimizer stability
Supports reliable neural network training
Abstract
A common way to train neural networks is the Backpropagation. This algorithm includes a gradient descent method, which needs an adaptive step size. In the area of neural networks, the ADAM-Optimizer is one of the most popular adaptive step size methods. It was invented in \cite{Kingma.2015} by Kingma and Ba. The citations in only three years shows additionally the importance of the given paper. We discovered that the given convergence proof of the optimizer contains some mistakes, so that the proof will be wrong. In this paper we give an improvement to the convergence proof of the ADAM-Optimizer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Ferroelectric and Negative Capacitance Devices
