Adam Simplified: Bias Correction Debunked
Sam Laing, Antonio Orvieto

TL;DR
This paper critically examines the bias correction component of the Adam optimizer, revealing that it often does not improve performance and can be detrimental without proper learning rate scheduling, challenging common assumptions.
Contribution
It provides a systematic analysis showing bias correction is often unnecessary and can be harmful, offering a reinterpretation as implicit learning rate scheduling.
Findings
Bias correction often does not improve test performance.
Without proper learning rate scheduling, bias correction can harm results.
Bias correction acts as implicit learning rate scheduling depending on hyper-parameters.
Abstract
The Adam optimizer is a cornerstone of modern deep learning, yet the empirical necessity of each of its individual components is often taken for granted. This paper presents a focused investigation into the role of bias-correction, a feature whose contribution remains poorly understood. Through a series of systematic ablations on vision and language modelling tasks, we demonstrate that the conventional wisdom surrounding bias correction is misleading. In particular, we demonstrate that in the optimal hyper-parameter configuration, the inclusion of bias correction leads to no improvement in final test performance. Moreover, unless appropriate learning rate scheduling is implemented, the inclusion of bias correction can sometimes be detrimental to performance. We further reinterpret bias correction as a form of implicit learning rate scheduling whose behaviour is strongly dependent on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
