Anon: Extrapolating Adaptivity Beyond SGD and Adam

Yiheng Zhang; Kaiyan Zhao; Shaowu Wu; Yiming Wang; Jiajun Wu; Leong Hou U; Steve Drew; Xiaoguang Niu

arXiv:2605.02317·cs.AI·May 7, 2026

Anon: Extrapolating Adaptivity Beyond SGD and Adam

Yiheng Zhang, Kaiyan Zhao, Shaowu Wu, Yiming Wang, Jiajun Wu, Leong Hou U, Steve Drew, Xiaoguang Niu

PDF

TL;DR

This paper introduces Anon, a novel optimizer with tunable adaptivity that bridges the gap between SGD and Adam, providing improved convergence and performance across diverse tasks.

Contribution

Anon is the first optimizer with continuously tunable adaptivity, enabling interpolation and extrapolation between SGD-like and Adam-like behaviors with theoretical guarantees.

Findings

01

Anon outperforms state-of-the-art optimizers on image, diffusion, and language tasks.

02

The incremental delay update (IDU) enhances robustness and convergence.

03

Anon effectively bridges classical and modern optimizer properties.

Abstract

Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity in R, allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.