Refresh-Scaling the Memory of Balanced Adam

Alberto Fern\'andez-Hern\'andez; Cristian P\'erez-Corral; Jose I. Mestre; Manuel F. Dolz; Enrique S. Quintana-Ort\'i

arXiv:2605.10119·cs.LG·May 13, 2026

Refresh-Scaling the Memory of Balanced Adam

Alberto Fern\'andez-Hern\'andez, Cristian P\'erez-Corral, Jose I. Mestre, Manuel F. Dolz, Enrique S. Quintana-Ort\'i

PDF

TL;DR

This paper proposes a new perspective on Adam's momentum parameter, viewing it as a memory horizon that can be tuned for improved robustness across vision and language tasks.

Contribution

It introduces a refresh rule based on the memory horizon, improving Adam's robustness by adaptively setting the momentum parameter according to training scale.

Findings

01

Choosing the refresh count R_β≈1000 improves robustness.

02

The refresh rule reduces the maximum validation loss gap by 33.4%.

03

All experiments achieve within 1% of their validation oracle.

Abstract

Recent evidence suggests that Adam performs robustly when its momentum parameters are tied, $β_{1} = β_{2}$ , reducing the optimizer to a single remaining parameter. However, how this parameter should be set remains poorly understood. We argue that, in balanced Adam, $β$ should not be treated as a dimensionless constant: it defines a statistical memory horizon $H_{β} = (1 - β)^{- 1}$ . In terms of the effective learning horizon $T_{ES}$ , estimated from the validation trajectory, we study the refresh count $R_{β} = (1 - β) T_{ES}$ , which measures how many times Adam renews its internal statistics during the useful phase of training. Across 11 vision and language experiments, we find that choosing $β$ so that $R_{β} \approx 1000$ selects different $β$ values depending on the training scale, yet improves robustness over the best fixed-beta baseline. Compared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.