Complexity Analysis of a Countable-armed Bandit Problem
Anand Kalvit, Assaf Zeevi

TL;DR
This paper analyzes a complex multi-armed bandit problem with a large action space and unknown arm-type distributions, proposing algorithms that achieve near-optimal regret bounds and highlighting key differences from classical bandit problems.
Contribution
It introduces algorithms for a countable-armed bandit problem with unknown arm-types, achieving optimal regret bounds and revealing unique complexity aspects distinct from classical models.
Findings
Instance-dependent regret is 40d log n
Instance-independent regret is 4aa for K=2
Performance bounds and algorithm design differ from classical MAB problems
Abstract
We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly arm-types, each characterized by a distinct mean reward. The decision maker is oblivious to the statistical properties of reward distributions as well as the population-level distribution of different arm-types, and is precluded also from observing the type of an arm after play. We study the classical problem of minimizing the expected cumulative regret over a horizon of play , and propose algorithms that achieve a rate-optimal finite-time instance-dependent regret of . We also show that the instance-independent (minimax) regret is when . While the order of regret and complexity of the problem suggests a great degree of similarity to the classical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
