Boosting for Online Convex Optimization
Elad Hazan, Karan Singh

TL;DR
This paper introduces an efficient boosting algorithm for online convex optimization with many experts, applicable in contextual and reinforcement learning, providing near-optimal regret guarantees in various feedback models.
Contribution
It generalizes online and gradient boosting to online convex and bandit linear optimization, offering new algorithms with regret guarantees in large expert classes.
Findings
Provides an efficient boosting algorithm with near-optimal regret guarantees.
Extends boosting techniques to online convex and bandit settings.
Achieves theoretical guarantees in both full and partial feedback models.
Abstract
We consider the decision-making framework of online convex optimization with a very large number of experts. This setting is ubiquitous in contextual and reinforcement learning problems, where the size of the policy class renders enumeration and search within the policy class infeasible. Instead, we consider generalizing the methodology of online boosting. We define a weak learning algorithm as a mechanism that guarantees multiplicatively approximate regret against a base class of experts. In this access model, we give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class. We consider both full and partial (a.k.a. bandit) information feedback models. We also give an analogous efficient boosting algorithm for the i.i.d. statistical setting. Our results simultaneously generalize online boosting and gradient boosting guarantees to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
