Adaptive Proximal Gradient Method for Convex Optimization
Yura Malitsky, Konstantin Mishchenko

TL;DR
This paper introduces adaptive versions of gradient descent and proximal gradient methods for convex optimization that utilize local curvature information, enabling larger stepsizes and convergence guarantees under minimal assumptions.
Contribution
The paper presents novel adaptive algorithms for GD and ProxGD that do not increase computational costs and are proven to converge using only local Lipschitz conditions.
Findings
Adaptive GD and ProxGD outperform standard methods in step size flexibility.
Convergence is guaranteed under local Lipschitzness of the gradient.
Larger stepsizes are achievable compared to previous approaches.
Abstract
In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Optimization and Variational Analysis
MethodsFocus
