Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed   Bandits: Simple Sequential Elimination Algorithms

MohammadJavad Azizi; Sheldon M Ross; Zhengyu Zhang

arXiv:2106.06848·cs.LG·March 17, 2022

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

MohammadJavad Azizi, Sheldon M Ross, Zhengyu Zhang

PDF

Open Access

TL;DR

This paper introduces simple sequential elimination algorithms for fixed-confidence best arm identification in multi-armed bandits, demonstrating their optimality and efficiency through Bayesian analysis and numerical comparisons.

Contribution

It proposes and analyzes the vector at a time and a variant of the play the winner algorithms for Bayesian fixed-confidence BAI, showing their optimality and practical performance.

Findings

01

Algorithms guarantee optimal strategies under the prior.

02

Early elimination improves the classical vector at a time rule.

03

Numerical results favor the proposed algorithms over existing methods.

Abstract

We consider the problem of finding, through adaptive sampling, which of $n$ options (arms) has the largest mean. Our objective is to determine a rule which identifies the best arm with a fixed minimum confidence using as few observations as possible, i.e. this is a fixed-confidence (FC) best arm identification (BAI) in multi-armed bandits. We study such problems under the Bayesian setting with both Bernoulli and Gaussian arms. We propose to use the classical "vector at a time" (VT) rule, which samples each remaining arm once in each round. We show how VT can be implemented and analyzed in our Bayesian setting and be improved by early elimination. Our analysis show that these algorithms guarantee an optimal strategy under the prior. We also propose and analyze a variant of the classical "play the winner" (PW) algorithm. Numerical results show that these rules compare favorably with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems