On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits
Shahin Shahrampour, Mohammad Noshad, Vahid Tarokh

TL;DR
This paper introduces a unified framework for sequential elimination algorithms in multi-armed bandits, proposing a nonlinear budget allocation strategy that improves best-arm identification performance and extends to side-observation models.
Contribution
It develops a general analysis of elimination algorithms, proposes a nonlinear budget division method, and provides theoretical and empirical validation for various environments and models.
Findings
The nonlinear algorithm outperforms state-of-the-art methods in experiments.
The framework provides a performance measure based on sampling and elimination.
Enhanced guarantees are established for side-observation models.
Abstract
We consider the best-arm identification problem in multi-armed bandits, which focuses purely on exploration. A player is given a fixed budget to explore a finite set of arms, and the rewards of each arm are drawn independently from a fixed, unknown distribution. The player aims to identify the arm with the largest expected reward. We propose a general framework to unify sequential elimination algorithms, where the arms are dismissed iteratively until a unique arm is left. Our analysis reveals a novel performance measure expressed in terms of the sampling mechanism and number of eliminated arms at each round. Based on this result, we develop an algorithm that divides the budget according to a nonlinear function of remaining arms at each round. We provide theoretical guarantees for the algorithm, characterizing the suitable nonlinearity for different problem environments described by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
