An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit Problems
Hyeong Soo Chang

TL;DR
This paper introduces an asymptotically optimal strategy for constrained stochastic multi-armed bandit problems, extending the epsilon-greedy approach and providing finite-time bounds that approach certainty over time.
Contribution
It presents a simple, extended epsilon-greedy strategy achieving asymptotic optimality in constrained MAB problems with finite-time performance guarantees.
Findings
Finite-time lower bound on correct arm selection probability.
Bound approaches one as time increases under certain conditions.
An example epsilon sequence with convergence rate of (1-1/t)^4.
Abstract
For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the -greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time goes to infinity. A particular example sequence of having the asymptotic convergence rate in the order of that holds from a sufficiently large is also discussed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
