Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms
MohammadJavad Azizi, Sheldon M Ross, Zhengyu Zhang

TL;DR
This paper introduces simple sequential elimination algorithms for fixed-confidence best arm identification in multi-armed bandits, demonstrating their optimality and efficiency through Bayesian analysis and numerical comparisons.
Contribution
It proposes and analyzes the vector at a time and a variant of the play the winner algorithms for Bayesian fixed-confidence BAI, showing their optimality and practical performance.
Findings
Algorithms guarantee optimal strategies under the prior.
Early elimination improves the classical vector at a time rule.
Numerical results favor the proposed algorithms over existing methods.
Abstract
We consider the problem of finding, through adaptive sampling, which of options (arms) has the largest mean. Our objective is to determine a rule which identifies the best arm with a fixed minimum confidence using as few observations as possible, i.e. this is a fixed-confidence (FC) best arm identification (BAI) in multi-armed bandits. We study such problems under the Bayesian setting with both Bernoulli and Gaussian arms. We propose to use the classical "vector at a time" (VT) rule, which samples each remaining arm once in each round. We show how VT can be implemented and analyzed in our Bayesian setting and be improved by early elimination. Our analysis show that these algorithms guarantee an optimal strategy under the prior. We also propose and analyze a variant of the classical "play the winner" (PW) algorithm. Numerical results show that these rules compare favorably with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
