TL;DR
This paper introduces the max-min grouped bandits problem, proposing algorithms with sample complexity bounds for identifying the best group based on the worst arm, relevant for recommendation and resource allocation.
Contribution
It formulates a new bandit problem with overlapping groups and develops algorithms with theoretical guarantees for max-min group identification.
Findings
Proposed algorithms with upper bounds on sample complexity.
Derived an algorithm-independent lower bound.
Analyzed tightness of bounds in various scenarios.
Abstract
In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find the group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems and resource allocation, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
