Corralling a Larger Band of Bandits: A Case Study on Switching Regret   for Linear Bandits

Haipeng Luo; Mengxiao Zhang; Peng Zhao; Zhi-Hua Zhou

arXiv:2202.06151·cs.LG·February 15, 2022

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

Haipeng Luo, Mengxiao Zhang, Peng Zhao, Zhi-Hua Zhou

PDF

Open Access

TL;DR

This paper introduces a new method to combine many bandit algorithms with a regret overhead that grows logarithmically with the number of algorithms, enabling optimal switching regret in adversarial linear bandits.

Contribution

The authors propose a novel recipe for aggregating a large set of bandit algorithms with logarithmic regret dependence on the number of algorithms, improving scalability and performance.

Findings

01

Achieves optimal switching regret of O( S T) in adversarial linear bandits.

02

Extends results to linear bandits over smooth, strongly convex, and unconstrained domains.

03

Demonstrates effectiveness of the method in high-dimensional and large algorithm set scenarios.

Abstract

We consider the problem of combining and learning over a set of adversarial bandit algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al., 2020a) achieve this goal with a regret overhead of order $O (M T)$ where $M$ is the number of base algorithms and $T$ is the time horizon. The polynomial dependence on $M$ , however, prevents one from applying these algorithms to many applications where $M$ is poly $(T)$ or even larger. Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only \emph{logarithmic} dependence on $M$ as long as some conditions are satisfied. As the main example, we apply our recipe to the problem of adversarial linear bandits over a $d$ -dimensional $ℓ_{p}$ unit-ball for $p \in (1, 2]$ . By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning

MethodsBalanced Selection