Fixed-Confidence Guarantees for Bayesian Best-Arm Identification
Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre M\'enard,, Michal Valko

TL;DR
This paper analyzes the Top-Two Thompson Sampling algorithm for fixed-confidence best-arm identification in bandit problems, introducing a new variant T3C that reduces computational complexity and providing the first sample complexity analysis for these methods.
Contribution
It offers the first sample complexity analysis of TTTS and T3C with a Bayesian stopping rule for Gaussian bandits, addressing an open question from Russo (2016).
Findings
TTTS and T3C are justified for fixed-confidence best-arm identification.
T3C reduces computational burden compared to TTTS.
New posterior convergence results for Gaussian and Bernoulli bandits.
Abstract
We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification. We further propose a variant of TTTS called Top-Two Transportation Cost (T3C), which disposes of the computational burden of TTTS. As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards, solving one of the open questions raised by Russo (2016). We also provide new posterior convergence results for TTTS under two models that are commonly used in practice: bandits with Gaussian and Bernoulli rewards and conjugate priors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Auction Theory and Applications
