An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit   Problems

Hyeong Soo Chang

arXiv:1805.01237·math.OC·May 4, 2018

An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit Problems

Hyeong Soo Chang

PDF

TL;DR

This paper introduces an asymptotically optimal strategy for constrained stochastic multi-armed bandit problems, extending the epsilon-greedy approach and providing finite-time bounds that approach certainty over time.

Contribution

It presents a simple, extended epsilon-greedy strategy achieving asymptotic optimality in constrained MAB problems with finite-time performance guarantees.

Findings

01

Finite-time lower bound on correct arm selection probability.

02

Bound approaches one as time increases under certain conditions.

03

An example epsilon sequence with convergence rate of (1-1/t)^4.

Abstract

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the $ϵ_{t}$ -greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time $t$ goes to infinity. A particular example sequence of ${ϵ_{t}}$ having the asymptotic convergence rate in the order of $(1 - \frac{1}{t})^{4}$ that holds from a sufficiently large $t$ is also discussed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.