Nonlinear Sequential Accepts and Rejects for Identification of Top Arms   in Stochastic Bandits

Shahin Shahrampour; Vahid Tarokh

arXiv:1707.02649·stat.ML·July 11, 2017

Nonlinear Sequential Accepts and Rejects for Identification of Top Arms in Stochastic Bandits

Shahin Shahrampour, Vahid Tarokh

PDF

TL;DR

This paper introduces a nonlinear sequential algorithm for identifying the top M arms in stochastic bandits, improving accuracy by adaptively allocating exploration budget and outperforming existing methods in various environments.

Contribution

The paper proposes a novel nonlinear budget allocation algorithm for M-best-arm identification, with theoretical analysis and empirical validation showing improved performance over prior approaches.

Findings

01

The algorithm effectively reduces misidentification probability decay rate.

02

Nonlinear budget allocation adapts well to different problem environments.

03

Numerical experiments demonstrate superior performance compared to state-of-the-art methods.

Abstract

We address the M-best-arm identification problem in multi-armed bandits. A player has a limited budget to explore K arms (M<K), and once pulled, each arm yields a reward drawn (independently) from a fixed, unknown distribution. The goal is to find the top M arms in the sense of expected reward. We develop an algorithm which proceeds in rounds to deactivate arms iteratively. At each round, the budget is divided by a nonlinear function of remaining arms, and the arms are pulled correspondingly. Based on a decision rule, the deactivated arm at each round may be accepted or rejected. The algorithm outputs the accepted arms that should ideally be the top M arms. We characterize the decay rate of the misidentification probability and establish that the nonlinear budget allocation proves to be useful for different problem environments (described by the number of competitive arms). We provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.