HVAdam: A Full-Dimension Adaptive Optimizer
Yiheng Zhang, Shaowu Wu, Yuanzhuo Xu, Jiajun Wu, Shang Xu, Steve Drew, Xiaoguang Niu

TL;DR
This paper introduces Anon, a novel adaptive optimizer with tunable adaptivity and a new convergence mechanism, which bridges the gap between SGD and Adam, improving performance across diverse large-scale tasks.
Contribution
The paper proposes Anon, a fully tunable adaptive optimizer with a new convergence technique, addressing limitations of existing optimizers and unifying classical and modern optimization strategies.
Findings
Anon outperforms state-of-the-art optimizers on various tasks.
The incremental delay update (IDU) enhances robustness to gradient noise.
Theoretical convergence guarantees are established for both convex and non-convex cases.
Abstract
Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity , allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
