Max-Quantile Grouped Infinite-Arm Bandits
Ivan Lau, Yan Hao Ling, Mayank Shrivastava, Jonathan Scarlett

TL;DR
This paper introduces a new bandit problem involving infinitely many arms in groups, aiming to identify the group with the highest quantile of arm rewards efficiently, and proposes an algorithm with theoretical guarantees.
Contribution
It formulates the max-quantile grouped infinite-arm bandit problem, proposes a two-step algorithm, and provides regret bounds and lower bounds for this novel setting.
Findings
The algorithm effectively identifies the top quantile group with minimal pulls.
Theoretical regret bounds match the established lower bounds.
Analysis discusses algorithm strengths, weaknesses, and potential improvements.
Abstract
In this paper, we consider a bandit problem in which there are a number of groups each consisting of infinitely many arms. Whenever a new arm is requested from a given group, its mean reward is drawn from an unknown reservoir distribution (different for each group), and the uncertainty in the arm's mean reward can only be reduced via subsequent pulls of the arm. The goal is to identify the infinite-arm group whose reservoir distribution has the highest -quantile (e.g., median if ), using as few total arm pulls as possible. We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm. We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter, while discussing various strengths, weaknesses, algorithmic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
