A New Adaptive Gradient Method with Gradient Decomposition
Zhou Shao, Tong Lin

TL;DR
DecGD is a novel adaptive gradient method that decomposes gradients to achieve rapid convergence and good generalization, outperforming existing methods like Adam and SGDM in various tasks.
Contribution
Introduces DecGD, an adaptive gradient method that adjusts learning rates based on a loss-based vector, combining fast convergence with improved generalization.
Findings
DecGD converges faster than traditional methods.
DecGD achieves better generalization than SGDM.
Empirical results validate DecGD's superior performance.
Abstract
Adaptive gradient methods, especially Adam-type methods (such as Adam, AMSGrad, and AdaBound), have been proposed to speed up the training process with an element-wise scaling term on learning rates. However, they often generalize poorly compared with stochastic gradient descent (SGD) and its accelerated schemes such as SGD with momentum (SGDM). In this paper, we propose a new adaptive method called DecGD, which simultaneously achieves good generalization like SGDM and obtain rapid convergence like Adam-type methods. In particular, DecGD decomposes the current gradient into the product of two terms including a surrogate gradient and a loss based vector. Our method adjusts the learning rates adaptively according to the current loss based vector instead of the squared gradients used in Adam-type methods. The intuition for adaptive learning rates of DecGD is that a good optimizer, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsAdam · AMSGrad · SGD with Momentum · Stochastic Gradient Descent
