HVAdam: A Full-Dimension Adaptive Optimizer

Yiheng Zhang; Shaowu Wu; Yuanzhuo Xu; Jiajun Wu; Shang Xu; Steve Drew; Xiaoguang Niu

arXiv:2511.20277·cs.LG·December 23, 2025

HVAdam: A Full-Dimension Adaptive Optimizer

Yiheng Zhang, Shaowu Wu, Yuanzhuo Xu, Jiajun Wu, Shang Xu, Steve Drew, Xiaoguang Niu

PDF

Open Access

TL;DR

This paper introduces Anon, a novel adaptive optimizer with tunable adaptivity and a new convergence mechanism, which bridges the gap between SGD and Adam, improving performance across diverse large-scale tasks.

Contribution

The paper proposes Anon, a fully tunable adaptive optimizer with a new convergence technique, addressing limitations of existing optimizers and unifying classical and modern optimization strategies.

Findings

01

Anon outperforms state-of-the-art optimizers on various tasks.

02

The incremental delay update (IDU) enhances robustness to gradient noise.

03

Theoretical convergence guarantees are established for both convex and non-convex cases.

Abstract

Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity , allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning