MetaGrad: Adaptation using Multiple Learning Rates in Online Learning
Tim van Erven, Wouter M. Koolen, Dirk van der Hoeven

TL;DR
MetaGrad is an adaptive online convex optimization algorithm that automatically adjusts multiple learning rates, achieving faster convergence for various functions and outperforming existing methods like online gradient descent and AdaGrad.
Contribution
Introduces MetaGrad, a novel meta-algorithm that adaptively combines multiple learning rates for improved online learning across diverse convex functions.
Findings
MetaGrad outperforms online gradient descent and AdaGrad on benchmark tasks.
It adapts to the curvature and gradient size of the data.
Three versions offer a trade-off between computational efficiency and performance.
Abstract
We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. We prove this by drawing a connection to the Bernstein condition, which is known to imply fast rates in offline statistical learning. MetaGrad further adapts automatically to the size of the gradients. Its main feature is that it simultaneously considers multiple learning rates, which are weighted directly proportional to their empirical performance on the data using a new meta-algorithm. We provide three versions of MetaGrad. The full matrix version maintains a full covariance matrix and is applicable to learning tasks for which we can afford update time quadratic in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
MethodsAdaGrad
