AngularGrad: A New Optimization Technique for Angular Convergence of   Convolutional Neural Networks

S.K. Roy; M.E. Paoletti; J.M. Haut; S.R. Dubey; P. Kar; A. Plaza; B.B.; Chaudhuri

arXiv:2105.10190·cs.LG·September 12, 2023·21 cites

AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

S.K. Roy, M.E. Paoletti, J.M. Haut, S.R. Dubey, P. Kar, A. Plaza, B.B., Chaudhuri

PDF

Open Access 3 Repos

TL;DR

AngularGrad is a novel optimizer for CNNs that leverages gradient angular information to adapt step sizes, resulting in smoother optimization and improved performance over existing methods.

Contribution

This paper introduces AngularGrad, the first optimizer to exploit gradient angular information for improved convergence in CNN training.

Findings

01

AngularGrad outperforms state-of-the-art optimizers on benchmark datasets.

02

Theoretical analysis shows AngularGrad has the same regret bound as Adam.

03

Two variants of AngularGrad using Tangent and Cosine functions are proposed.

Abstract

Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM

MethodsAdam · Stochastic Gradient Descent