Batched Multi-armed Bandits Problem

Zijun Gao; Yanjun Han; Zhimei Ren; Zhengqing Zhou

arXiv:1904.01763·stat.ML·October 29, 2019·42 cites

Batched Multi-armed Bandits Problem

Zijun Gao, Yanjun Han, Zhimei Ren, Zhengqing Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces the BaSE policy for batched multi-armed bandits, achieving near-optimal regret bounds even with adaptive batch sizes, and explores how the number of arms affects regret.

Contribution

It proposes a new policy that attains rate-optimal regret bounds for batched multi-armed bandits, addressing open questions about the impact of arm count and adaptive batching.

Findings

01

BaSE policy achieves near-optimal regret bounds.

02

Matching lower bounds established for adaptive batch sizes.

03

Analysis of the effect of the number of arms on regret.

Abstract

In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches. While the minimax regret for the two-armed stochastic bandits has been completely characterized in \cite{perchet2016batched}, the effect of the number of arms on the regret for the multi-armed case is still open. Moreover, the question whether adaptively chosen batch sizes will help to reduce the regret also remains underexplored. In this paper, we propose the BaSE (batched successive elimination) policy to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in an adaptive manner.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mathegineer/batched-bandit
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Age of Information Optimization