Batched bandit problems

Vianney Perchet; Philippe Rigollet; Sylvain Chassang; Erik Snowberg

arXiv:1505.00369·math.ST·March 30, 2016·COLT

Batched bandit problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg

PDF

TL;DR

This paper investigates the regret bounds for stochastic bandit algorithms under batch constraints, proposing simple policies that achieve near-optimal regret with few batches, and also derives low-switching-cost policies.

Contribution

It introduces policies that perform near-optimally with minimal batching and low switching costs, addressing practical constraints in applications like clinical trials.

Findings

01

Near-minimax regret with few batches

02

Optimal policies with low switching costs

03

Effective strategies for batch-constrained bandit problems

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.