diffGrad: An Optimization Method for Convolutional Neural Networks

Shiv Ram Dubey; Soumendu Chakraborty; Swalpa Kumar Roy; Snehasis; Mukherjee; Satish Kumar Singh; Bidyut Baran Chaudhuri

arXiv:1909.11015·cs.LG·November 30, 2021

diffGrad: An Optimization Method for Convolutional Neural Networks

Shiv Ram Dubey, Soumendu Chakraborty, Swalpa Kumar Roy, Snehasis, Mukherjee, Satish Kumar Singh, Bidyut Baran Chaudhuri

PDF

Open Access 1 Repo

TL;DR

diffGrad is a new optimization algorithm for CNNs that adapts step sizes based on the change in gradients, outperforming existing methods on standard datasets and theoretical benchmarks.

Contribution

This paper introduces diffGrad, a novel optimizer that adjusts learning rates based on local gradient changes, improving convergence over traditional methods.

Findings

01

diffGrad outperforms SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam.

02

It demonstrates consistent performance across different activation functions.

03

Theoretical convergence analysis supports its effectiveness.

Abstract

Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal sized steps for all parameters, irrespective of gradient behavior. Hence, an efficient way of deep network optimization is to make adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp and Adam. These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take advantage of local change in gradients. In this paper, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shivram1987/diffGrad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Brain Tumor Detection and Classification

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Average Pooling · 1x1 Convolution · Residual Connection · Max Pooling · Global Average Pooling · Bottleneck Residual Block · Residual Block · Kaiming Initialization