ACMo: Angle-Calibrated Moment Methods for Stochastic Optimization
Xunpeng Huang, Runxin Xu, Hao Zhou, Zhe Wang, Zhengyang Liu, Lei Li

TL;DR
This paper introduces ACMo, a novel adaptive optimizer that combines the strengths of SGD and adaptive methods, achieving comparable convergence and better generalization in machine learning tasks.
Contribution
We propose ACMo, a new angle-calibrated moment method that enhances understanding of second moments and improves optimization performance.
Findings
ACMo achieves convergence rates similar to Adam-type optimizers.
ACMo demonstrates better generalization in CV and NLP tasks.
Experimental results validate ACMo's effectiveness across multiple benchmarks.
Abstract
Due to its simplicity and outstanding ability to generalize, stochastic gradient descent (SGD) is still the most widely used optimization method despite its slow convergence. Meanwhile, adaptive methods have attracted rising attention of optimization and machine learning communities, both for the leverage of life-long information and for the profound and fundamental mathematical theory. Taking the best of both worlds is the most exciting and challenging question in the field of optimization for machine learning. Along this line, we revisited existing adaptive gradient methods from a novel perspective, refreshing understanding of second moments. Our new perspective empowers us to attach the properties of second moments to the first moment iteration, and to propose a novel first moment optimizer, \emph{Angle-Calibrated Moment method} (\method). Our theoretical results show that \method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Privacy-Preserving Technologies in Data
