On the Optimal Sample Complexity for Best Arm Identification

Lijie Chen; Jian Li

arXiv:1511.03774·cs.LG·August 24, 2016·35 cites

On the Optimal Sample Complexity for Best Arm Identification

Lijie Chen, Jian Li

PDF

Open Access

TL;DR

This paper advances understanding of the sample complexity in best arm identification for stochastic bandits, introducing improved algorithms and lower bounds, especially for the two-arm case, and proposing a conjecture on optimality.

Contribution

It presents a new algorithm for BEST-1-ARM that surpasses previous bounds, and establishes a novel lower bound for the two-arm sign problem, extending classical results and linking them to BEST-1-ARM.

Findings

01

New upper bound algorithm for BEST-1-ARM

02

A simplified, extended lower bound for the sign problem

03

Reduction from Sign problem to BEST-1-ARM for lower bounds

Abstract

We study the best arm identification (BEST-1-ARM) problem, which is defined as follows. We are given $n$ stochastic bandit arms. The $i$ th arm has a reward distribution $D_{i}$ with an unknown mean $μ_{i}$ . Upon each play of the $i$ th arm, we can get a reward, sampled i.i.d. from $D_{i}$ . We would like to identify the arm with the largest mean with probability at least $1 - δ$ , using as few samples as possible. We provide a nontrivial algorithm for BEST-1-ARM, which improves upon several prior upper bounds on the same problem. We also study an important special case where there are only two arms, which we call the sign problem. We provide a new lower bound of sign, simplifying and significantly extending a classical result by Farrell in 1964, with a completely new proof. Using the new lower bound for sign, we obtain the first lower bound for BEST-1-ARM that goes beyond the classic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics