Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint
Apostolos N. Burnetas, Odysseas Kanavetas, Michael N. Katehakis

TL;DR
This paper introduces asymptotically optimal policies for multi-armed bandit problems with cost constraints, ensuring maximum reward while respecting sample costs, and provides explicit solutions for normal distributions.
Contribution
It establishes a lower bound on regret for cost-constrained bandits and constructs policies that achieve this bound asymptotically, including explicit forms for normal distributions.
Findings
Derived a necessary asymptotic lower bound for regret.
Constructed policies that are asymptotically optimal within the class of feasible policies.
Provided explicit policies for normal distributions with unknown means.
Abstract
We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to design a feasible policy for deciding from which population to sample from, so as to maximize the expected sum of outcomes of total samples or equivalently to minimize the regret due to lack on information on sample distributions, For this problem we consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy the cost constraint sample-path wise. We first establish a necessary asymptotic lower bound for the rate of increase of the regret function of f-UF policies. Then we construct a class of f-UF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
