The Max $K$-Armed Bandit: A PAC Lower Bound and tighter Algorithms
Yahel David, Nahum Shimkin

TL;DR
This paper establishes lower bounds and proposes algorithms for the Max K-Armed Bandit problem within the PAC framework, improving understanding of sample complexity and performance in identifying the best reward source.
Contribution
It provides the first PAC lower bounds for the Max K-Armed Bandit problem and introduces algorithms that nearly match these bounds, advancing the theoretical understanding of this problem.
Findings
Lower bounds on sample complexity for PAC algorithms.
Algorithms that nearly attain these lower bounds.
Comparison showing when random arm selection can outperform targeted strategies.
Abstract
We consider the Max -Armed Bandit problem, where a learning agent is faced with several sources (arms) of items (rewards), and interested in finding the best item overall. At each time step the agent chooses an arm, and obtains a random real valued reward. The rewards of each arm are assumed to be i.i.d., with an unknown probability distribution that generally differs among the arms. Under the PAC framework, we provide lower bounds on the sample complexity of any -correct algorithm, and propose algorithms that attain this bound up to logarithmic factors. We compare the performance of this multi-arm algorithms to the variant in which the arms are not distinguishable by the agent and are chosen randomly at each stage. Interestingly, when the maximal rewards of the arms happen to be similar, the latter approach may provide better performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
