Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits
Haipeng Luo, Mengxiao Zhang, Peng Zhao, Zhi-Hua Zhou

TL;DR
This paper introduces a new method to combine many bandit algorithms with a regret overhead that grows logarithmically with the number of algorithms, enabling optimal switching regret in adversarial linear bandits.
Contribution
The authors propose a novel recipe for aggregating a large set of bandit algorithms with logarithmic regret dependence on the number of algorithms, improving scalability and performance.
Findings
Achieves optimal switching regret of O( S T) in adversarial linear bandits.
Extends results to linear bandits over smooth, strongly convex, and unconstrained domains.
Demonstrates effectiveness of the method in high-dimensional and large algorithm set scenarios.
Abstract
We consider the problem of combining and learning over a set of adversarial bandit algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al., 2020a) achieve this goal with a regret overhead of order where is the number of base algorithms and is the time horizon. The polynomial dependence on , however, prevents one from applying these algorithms to many applications where is poly or even larger. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only \emph{logarithmic} dependence on as long as some conditions are satisfied. As the main example, we apply our recipe to the problem of adversarial linear bandits over a -dimensional unit-ball for . By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
MethodsBalanced Selection
