AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate
Meng Zhu, Quan Xiao, Weidong Min

TL;DR
AdamNX introduces a novel exponential decay mechanism for second-order moment estimation, improving training stability and potentially enhancing generalization in large-scale models compared to traditional Adam.
Contribution
The paper proposes AdamNX, a new optimization algorithm with a unique exponential decay rate for second-order moments, addressing Adam's tendency to converge to non-flat minima.
Findings
AdamNX outperforms Adam and variants in stability and performance.
The exponential decay rate improves convergence to flatter minima.
Open-source implementation available at GitHub.
Abstract
Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-dimensional optimization to local and even global minima. Entering the era of large language models, although the scale of model parameters and data has increased, Adam remains the mainstream optimization algorithm. However, compared with stochastic gradient descent (SGD) based optimization algorithms, Adam is more likely to converge to non-flat minima. To address this issue, the AdamNX algorithm is proposed. Its core innovation lies in the proposition of a novel type of second-order moment estimation exponential decay rate, which gradually weakens the learning step correction strength as training progresses, and degrades to momentum SGD in the stable training period, thereby improving the stability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning in Healthcare · Neural Networks and Applications
