AdamD: Improved bias-correction in Adam

John St John

arXiv:2110.10828·cs.LG·October 25, 2021·1 cites

AdamD: Improved bias-correction in Adam

John St John

PDF

Open Access

TL;DR

This paper proposes a modification to the Adam optimizer's bias-correction that results in smaller, more appropriate gradient updates during early training, improving stability and hyperparameter sensitivity.

Contribution

It introduces a simplified bias-correction approach focusing only on the second moment estimate, enhancing early training stability in Adam.

Findings

01

Smaller gradient updates in initial training steps.

02

Reduced sensitivity to hyperparameters.

03

Improved early training stability.

Abstract

Here I present a small update to the bias-correction term in the Adam optimizer that has the advantage of making smaller gradient updates in the first several steps of training. With the default bias-correction, Adam may actually make larger than requested gradient updates early in training. By only including the well-justified bias-correction of the second moment gradient estimate, $v_{t}$ , and excluding the bias-correction on the first-order estimate, $m_{t}$ , we attain these more desirable gradient update properties in the first series of steps. The default implementation of Adam may be as sensitive as it is to the hyperparameters $β_{1}, β_{2}$ partially due to the originally proposed bias correction procedure, and its behavior in early steps.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques

MethodsAdam