Variance-Dependent Best Arm Identification
Pinyan Lu, Chao Tao, Xiaojin Zhang

TL;DR
This paper introduces a variance-dependent algorithm for best arm identification in stochastic bandits, improving sample efficiency by adaptively exploiting reward variances and gaps, and achieving near-optimal theoretical guarantees.
Contribution
The paper presents a novel adaptive algorithm using grouped median elimination that leverages reward variances, removing the extra logarithmic factor and achieving near-optimal sample complexity.
Findings
Achieves sample complexity close to the theoretical lower bound.
Removes the extra log n factor in variance-independent algorithms.
Provides the first variance-dependent analysis with optimality guarantees.
Abstract
We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of arms indexed from to , each arm is associated with an unknown reward distribution supported on with mean and variance . Assume . We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called \textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability and uses at most samples, where () denotes the reward gap between arm and the best arm and we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications
