Batched bandit problems
Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg

TL;DR
This paper investigates the regret bounds for stochastic bandit algorithms under batch constraints, proposing simple policies that achieve near-optimal regret with few batches, and also derives low-switching-cost policies.
Contribution
It introduces policies that perform near-optimally with minimal batching and low switching costs, addressing practical constraints in applications like clinical trials.
Findings
Near-minimax regret with few batches
Optimal policies with low switching costs
Effective strategies for batch-constrained bandit problems
Abstract
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
