Parameter-free projected gradient descent
Evgenii Chzhen (LMO, CELESTE), Christophe Giraud (LMO, CELESTE),, Gilles Stoltz (LMO, CELESTE)

TL;DR
This paper introduces a parameter-free adaptive projected gradient descent algorithm that achieves optimal convergence rates without additional hyperparameters, handling projections and stochastic settings effectively.
Contribution
It presents a fully parameter-free version of AdaGrad for convex optimization with projections, improving adaptivity and simplicity over existing methods.
Findings
Achieves optimal convergence rates up to logarithmic factors.
Handles projection steps without restarts or reweighing.
Extends to stochastic optimization with supporting experiments.
Abstract
We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD). We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients. Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional gradient evaluations compared to the classical PGD. It also fulfills optimal rates of convergence for cumulative regret up to logarithmic factors. We provide an extension of our approach to stochastic optimization and conduct numerical experiments supporting the developed theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsAdaGrad
