AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks
S.K. Roy, M.E. Paoletti, J.M. Haut, S.R. Dubey, P. Kar, A. Plaza, B.B., Chaudhuri

TL;DR
AngularGrad is a novel optimizer for CNNs that leverages gradient angular information to adapt step sizes, resulting in smoother optimization and improved performance over existing methods.
Contribution
This paper introduces AngularGrad, the first optimizer to exploit gradient angular information for improved convergence in CNN training.
Findings
AngularGrad outperforms state-of-the-art optimizers on benchmark datasets.
Theoretical analysis shows AngularGrad has the same regret bound as Adam.
Two variants of AngularGrad using Tangent and Cosine functions are proposed.
Abstract
Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
MethodsAdam · Stochastic Gradient Descent
