Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Juntang Zhuang, Yifan Ding, Tommy Tang, Nicha Dvornek, Sekhar, Tatikonda, James S. Duncan

TL;DR
ACProp is a novel asynchronous adaptive optimizer that combines momentum centering and asynchronous updates, offering strong theoretical guarantees and superior empirical performance across various deep learning tasks.
Contribution
It introduces ACProp, an optimizer with weaker convergence conditions and optimal rate, validated through extensive experiments on CNNs, GANs, reinforcement learning, and transformers.
Findings
ACProp has a convergence rate of O(1/√T), matching the oracle rate.
ACProp outperforms SGD and other adaptive optimizers in image classification.
ACProp demonstrates superior training stability and generalization in diverse models.
Abstract
We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for -th update, denominator uses information up to step , while numerator uses gradient at -th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g. Adam, RMSProp, AdaBelief); within asynchronous optimizers, we show that centering of second momentum further weakens the convergence condition. We demonstrate that ACProp has a convergence rate of for the stochastic non-convex case, which matches the oracle rate and outperforms the rate of RMSProp and Adam. We validate ACProp in extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent · Adam · AdaShift · RMSProp
