Variance-Dependent Best Arm Identification

Pinyan Lu; Chao Tao; Xiaojin Zhang

arXiv:2106.10417·cs.LG·May 30, 2023·1 cites

Variance-Dependent Best Arm Identification

Pinyan Lu, Chao Tao, Xiaojin Zhang

PDF

Open Access

TL;DR

This paper introduces a variance-dependent algorithm for best arm identification in stochastic bandits, improving sample efficiency by adaptively exploiting reward variances and gaps, and achieving near-optimal theoretical guarantees.

Contribution

The paper presents a novel adaptive algorithm using grouped median elimination that leverages reward variances, removing the extra logarithmic factor and achieving near-optimal sample complexity.

Findings

01

Achieves sample complexity close to the theoretical lower bound.

02

Removes the extra log n factor in variance-independent algorithms.

03

Provides the first variance-dependent analysis with optimality guarantees.

Abstract

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$ , each arm $i$ is associated with an unknown reward distribution supported on $[0, 1]$ with mean $θ_{i}$ and variance $σ_{i}^{2}$ . Assume $θ_{1} > θ_{2} \geq \dots \geq θ_{n}$ . We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called \textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability $(1 - δ)$ and uses at most $O (\sum_{i = 1}^{n} (\frac{σ _{i}^{2}}{Δ _{i}^{2}} + \frac{1}{Δ _{i}}) (ln δ^{- 1} + ln ln Δ_{i}^{- 1}))$ samples, where $Δ_{i}$ ( $i \geq 2$ ) denotes the reward gap between arm $i$ and the best arm and we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications