diffGrad: An Optimization Method for Convolutional Neural Networks
Shiv Ram Dubey, Soumendu Chakraborty, Swalpa Kumar Roy, Snehasis, Mukherjee, Satish Kumar Singh, Bidyut Baran Chaudhuri

TL;DR
diffGrad is a new optimization algorithm for CNNs that adapts step sizes based on the change in gradients, outperforming existing methods on standard datasets and theoretical benchmarks.
Contribution
This paper introduces diffGrad, a novel optimizer that adjusts learning rates based on local gradient changes, improving convergence over traditional methods.
Findings
diffGrad outperforms SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam.
It demonstrates consistent performance across different activation functions.
Theoretical convergence analysis supports its effectiveness.
Abstract
Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal sized steps for all parameters, irrespective of gradient behavior. Hence, an efficient way of deep network optimization is to make adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp and Adam. These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take advantage of local change in gradients. In this paper, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Brain Tumor Detection and Classification
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Average Pooling · 1x1 Convolution · Residual Connection · Max Pooling · Global Average Pooling · Bottleneck Residual Block · Residual Block · Kaiming Initialization
