Convergence Analysis of Gradient Descent Algorithms with Proportional Updates
Igor Gitman, Deepak Dilipkumar, Ben Parr

TL;DR
This paper provides a theoretical convergence analysis of gradient descent algorithms with proportional updates, such as LARS and PercentDelta, which are used to improve training stability in deep learning.
Contribution
It offers the first rigorous convergence analysis of proportional update algorithms and explores their potential extensions beyond neural networks.
Findings
Proved convergence properties of proportional update algorithms.
Validated theoretical results with empirical experiments.
Identified potential improvements using different norms or learning rate schedules.
Abstract
The rise of deep learning in recent years has brought with it increasingly clever optimization methods to deal with complex, non-linear loss functions. These methods are often designed with convex optimization in mind, but have been shown to work well in practice even for the highly non-convex optimization associated with neural networks. However, one significant drawback of these methods when they are applied to deep learning is that the magnitude of the update step is sometimes disproportionate to the magnitude of the weights (much smaller or larger), leading to training instabilities such as vanishing and exploding gradients. An idea to combat this issue is gradient descent with proportional updates. Gradient descent with proportional updates was introduced in 2017. It was independently developed by You et al (Layer-wise Adaptive Rate Scaling (LARS) algorithm) and by Abu-El-Haija…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research
MethodsLARS
