The Effect of Mini-Batch Noise on the Implicit Bias of Adam

Matias D. Cattaneo; Boris Shigida

arXiv:2602.01642·cs.LG·May 11, 2026

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

Matias D. Cattaneo, Boris Shigida

PDF

TL;DR

This paper presents a theoretical framework analyzing how mini-batch noise affects Adam optimizer's implicit bias towards sharper or flatter minima, impacting generalization in multi-epoch training.

Contribution

It reveals how hyperparameters and influence implicit bias depending on batch size, guiding better hyperparameter choices for different batch regimes.

Findings

01

Higher increases anti-regularization at large batch sizes, hurting generalization.

02

At smaller batch sizes, the dependence of anti-regularization on reverses.

03

The default =0.9, =0.999 is optimal for small batches; larger batches benefit from closer to .

Abstract

With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparameters $(β_{1}, β_{2})$ controlling memory and one very important hyperparameter, batch size, controlling (in particular) the amount mini-batch noise. We introduce a theoretical framework to understand how mini-batch noise influences the implicit bias of memory in Adam (depending on $β_{1}$ , $β_{2}$ ) towards sharper or flatter regions of the loss landscape, which is commonly observed to correlate with the generalization gap in multi-epoch training. We find that in the case of large batch sizes, higher $β_{2}$ increases the magnitude of anti-regularization by memory (hurting generalization), but as the batch size becomes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.