Convergence Analysis of Gradient Descent Algorithms with Proportional   Updates

Igor Gitman; Deepak Dilipkumar; Ben Parr

arXiv:1801.03137·cs.LG·January 12, 2018

Convergence Analysis of Gradient Descent Algorithms with Proportional Updates

Igor Gitman, Deepak Dilipkumar, Ben Parr

PDF

Open Access

TL;DR

This paper provides a theoretical convergence analysis of gradient descent algorithms with proportional updates, such as LARS and PercentDelta, which are used to improve training stability in deep learning.

Contribution

It offers the first rigorous convergence analysis of proportional update algorithms and explores their potential extensions beyond neural networks.

Findings

01

Proved convergence properties of proportional update algorithms.

02

Validated theoretical results with empirical experiments.

03

Identified potential improvements using different norms or learning rate schedules.

Abstract

The rise of deep learning in recent years has brought with it increasingly clever optimization methods to deal with complex, non-linear loss functions. These methods are often designed with convex optimization in mind, but have been shown to work well in practice even for the highly non-convex optimization associated with neural networks. However, one significant drawback of these methods when they are applied to deep learning is that the magnitude of the update step is sometimes disproportionate to the magnitude of the weights (much smaller or larger), leading to training instabilities such as vanishing and exploding gradients. An idea to combat this issue is gradient descent with proportional updates. Gradient descent with proportional updates was introduced in 2017. It was independently developed by You et al (Layer-wise Adaptive Rate Scaling (LARS) algorithm) and by Abu-El-Haija…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research

MethodsLARS