Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

Kushal Chakrabarti; Nikhil Chopra

arXiv:2106.00092·cs.LG·October 4, 2021

Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

Kushal Chakrabarti, Nikhil Chopra

PDF

Open Access

TL;DR

This paper introduces G-AdaGrad, a new accelerated optimizer for non-convex machine learning, analyzed via a state-space approach, with empirical validation on MNIST showing improved convergence and performance.

Contribution

It proposes G-AdaGrad, a novel optimizer, and applies a state-space perspective to analyze convergence of AdaGrad and Adam in non-convex settings.

Findings

01

G-AdaGrad accelerates convergence compared to AdaGrad.

02

State-space models provide clear convergence analysis.

03

Empirical results on MNIST support theoretical claims.

Abstract

Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent accelerated gradient algorithms are AdaGrad and Adam. AdaGrad is the simplest accelerated gradient method, which is particularly effective for sparse data. Adam has been shown to perform favorably in deep learning problems compared to other methods. In this paper, we propose a new fast optimizer, Generalized AdaGrad (G-AdaGrad), for accelerating the solution of potentially non-convex machine learning problems. Specifically, we adopt a state-space perspective for analyzing the convergence of gradient acceleration algorithms, namely G-AdaGrad and Adam, in machine learning. Our proposed state-space models are governed by ordinary differential equations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsAdaGrad · Adam