A New Adaptive Gradient Method with Gradient Decomposition

Zhou Shao; Tong Lin

arXiv:2107.08377·cs.LG·July 20, 2021

A New Adaptive Gradient Method with Gradient Decomposition

Zhou Shao, Tong Lin

PDF

Open Access

TL;DR

DecGD is a novel adaptive gradient method that decomposes gradients to achieve rapid convergence and good generalization, outperforming existing methods like Adam and SGDM in various tasks.

Contribution

Introduces DecGD, an adaptive gradient method that adjusts learning rates based on a loss-based vector, combining fast convergence with improved generalization.

Findings

01

DecGD converges faster than traditional methods.

02

DecGD achieves better generalization than SGDM.

03

Empirical results validate DecGD's superior performance.

Abstract

Adaptive gradient methods, especially Adam-type methods (such as Adam, AMSGrad, and AdaBound), have been proposed to speed up the training process with an element-wise scaling term on learning rates. However, they often generalize poorly compared with stochastic gradient descent (SGD) and its accelerated schemes such as SGD with momentum (SGDM). In this paper, we propose a new adaptive method called DecGD, which simultaneously achieves good generalization like SGDM and obtain rapid convergence like Adam-type methods. In particular, DecGD decomposes the current gradient into the product of two terms including a surrogate gradient and a loss based vector. Our method adjusts the learning rates adaptively according to the current loss based vector instead of the squared gradients used in Adam-type methods. The intuition for adaptive learning rates of DecGD is that a good optimizer, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsAdam · AMSGrad · SGD with Momentum · Stochastic Gradient Descent