Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Kushal Chakrabarti, Nikhil Chopra

TL;DR
This paper introduces G-AdaGrad, a new accelerated optimizer for non-convex machine learning, analyzed via a state-space approach, with empirical validation on MNIST showing improved convergence and performance.
Contribution
It proposes G-AdaGrad, a novel optimizer, and applies a state-space perspective to analyze convergence of AdaGrad and Adam in non-convex settings.
Findings
G-AdaGrad accelerates convergence compared to AdaGrad.
State-space models provide clear convergence analysis.
Empirical results on MNIST support theoretical claims.
Abstract
Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent accelerated gradient algorithms are AdaGrad and Adam. AdaGrad is the simplest accelerated gradient method, which is particularly effective for sparse data. Adam has been shown to perform favorably in deep learning problems compared to other methods. In this paper, we propose a new fast optimizer, Generalized AdaGrad (G-AdaGrad), for accelerating the solution of potentially non-convex machine learning problems. Specifically, we adopt a state-space perspective for analyzing the convergence of gradient acceleration algorithms, namely G-AdaGrad and Adam, in machine learning. Our proposed state-space models are governed by ordinary differential equations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsAdaGrad · Adam
