Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances
Ruida Zhou, Chao Tian

TL;DR
This paper investigates the sample complexity of identifying top-$m$ arms in a multi-armed bandit setting with heterogenous reward variances, providing tight bounds that incorporate variance heterogeneity and entropy measures.
Contribution
It introduces a new complexity characterization for top-$m$ arm identification considering reward variance heterogeneity, with matching upper and lower bounds.
Findings
Derived the worst-case sample complexity involving variance heterogeneity and entropy.
Proposed a divide-and-conquer algorithm achieving the upper bound.
Established a matching lower bound through dual formulation analysis.
Abstract
We study the effect of reward variance heterogeneity in the approximate top- arm identification setting. In this setting, the reward for the -th arm follows a -sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify arms with the largest means within error out of the arms, with probability at least . We show that the worst-case sample complexity of this problem is where are certain specific subsets of the overall arm set , and is an entropy-like function which measures the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Markov Chains and Monte Carlo Methods
