A Variant of Gradient Descent Algorithm Based on Gradient Averaging
Saugata Purkayastha, Sukannya Purkayastha

TL;DR
This paper introduces Grad-Avg, a gradient averaging optimizer, proves its convergence, and demonstrates its competitive performance and faster convergence compared to existing optimizers in regression and classification tasks.
Contribution
The paper presents a new gradient averaging optimizer, Grad-Avg, with proven convergence and improved performance in classification tasks over state-of-the-art methods.
Findings
Grad-Avg converges mathematically to a minimizer.
In regression, Grad-Avg behaves similarly to SGD.
In classification, scaling parameters improves Grad-Avg's performance.
Abstract
In this work, we study an optimizer, Grad-Avg to optimize error functions. We establish the convergence of the sequence of iterates of Grad-Avg mathematically to a minimizer (under boundedness assumption). We apply Grad-Avg along with some of the popular optimizers on regression as well as classification tasks. In regression tasks, it is observed that the behaviour of Grad-Avg is almost identical with Stochastic Gradient Descent (SGD). We present a mathematical justification of this fact. In case of classification tasks, it is observed that the performance of Grad-Avg can be enhanced by suitably scaling the parameters. Experimental results demonstrate that Grad-Avg converges faster than the other state-of-the-art optimizers for the classification task on two benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
