On the Optimal Sample Complexity for Best Arm Identification
Lijie Chen, Jian Li

TL;DR
This paper advances understanding of the sample complexity in best arm identification for stochastic bandits, introducing improved algorithms and lower bounds, especially for the two-arm case, and proposing a conjecture on optimality.
Contribution
It presents a new algorithm for BEST-1-ARM that surpasses previous bounds, and establishes a novel lower bound for the two-arm sign problem, extending classical results and linking them to BEST-1-ARM.
Findings
New upper bound algorithm for BEST-1-ARM
A simplified, extended lower bound for the sign problem
Reduction from Sign problem to BEST-1-ARM for lower bounds
Abstract
We study the best arm identification (BEST-1-ARM) problem, which is defined as follows. We are given stochastic bandit arms. The th arm has a reward distribution with an unknown mean . Upon each play of the th arm, we can get a reward, sampled i.i.d. from . We would like to identify the arm with the largest mean with probability at least , using as few samples as possible. We provide a nontrivial algorithm for BEST-1-ARM, which improves upon several prior upper bounds on the same problem. We also study an important special case where there are only two arms, which we call the sign problem. We provide a new lower bound of sign, simplifying and significantly extending a classical result by Farrell in 1964, with a completely new proof. Using the new lower bound for sign, we obtain the first lower bound for BEST-1-ARM that goes beyond the classic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
