Best-of-K Bandits
Max Simchowitz, Kevin Jamieson, Benjamin Recht

TL;DR
This paper investigates the Best-of-K Bandit problem, establishing lower bounds, analyzing the impact of distribution structure, and proposing an algorithm for independent arms to efficiently identify the optimal subset.
Contribution
It provides distribution-dependent lower bounds, insights into when exhaustive search can be avoided, and introduces an algorithm for independent arms with theoretical analysis.
Findings
Lower bounds match naive upper bounds for general distributions.
Exhaustive search can be avoided in favorable distributions.
An algorithm for independent arms mitigates information occlusion.
Abstract
This paper studies the Best-of-K Bandit game: At each time the player chooses a subset S among all N-choose-K possible options and observes reward max(X(i) : i in S) where X is a random vector drawn from a joint distribution. The objective is to identify the subset that achieves the highest expected reward with high probability using as few queries as possible. We present distribution-dependent lower bounds based on a particular construction which force a learner to consider all N-choose-K subsets, and match naive extensions of known upper bounds in the bandit setting obtained by treating each subset as a separate arm. Nevertheless, we present evidence that exhaustive search may be avoided for certain, favorable distributions because the influence of high-order order correlations may be dominated by lower order statistics. Finally, we present an algorithm and analysis for independent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms
