Anon: Extrapolating Adaptivity Beyond SGD and Adam
Yiheng Zhang, Kaiyan Zhao, Shaowu Wu, Yiming Wang, Jiajun Wu, Leong Hou U, Steve Drew, Xiaoguang Niu

TL;DR
This paper introduces Anon, a novel optimizer with tunable adaptivity that bridges the gap between SGD and Adam, providing improved convergence and performance across diverse tasks.
Contribution
Anon is the first optimizer with continuously tunable adaptivity, enabling interpolation and extrapolation between SGD-like and Adam-like behaviors with theoretical guarantees.
Findings
Anon outperforms state-of-the-art optimizers on image, diffusion, and language tasks.
The incremental delay update (IDU) enhances robustness and convergence.
Anon effectively bridges classical and modern optimizer properties.
Abstract
Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity in R, allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
