Revisiting Simple Regret: Fast Rates for Returning a Good Arm
Yao Zhao, Connor James Stephens, Csaba Szepesv\'ari, Kwang-Sung Jun

TL;DR
This paper advances the understanding of simple regret minimization in multi-armed bandits by providing improved bounds for the Sequential Halving algorithm and proposing a new method, Bracketing SH, that performs well even with limited data.
Contribution
The paper offers an improved instance-dependent analysis of Sequential Halving and introduces Bracketing SH, a new algorithm effective in data-poor regimes for simple regret minimization.
Findings
Optimal worst-case simple regret bound of √(n/T) up to logs.
Matching instance-dependent lower bounds for ε-good arm identification.
Bracketing SH outperforms existing methods on real-world tasks.
Abstract
Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an -good arm, perhaps due to lack of easy ways to characterize it. In this paper, we make significant progress on minimizing simple regret in both data-rich () and data-poor regime () where is the number of arms, and is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm, where we bound the probability of returning an arm whose mean reward is not within from the best (i.e., not -good) for \textit{any} choice of , although is not an input to SH. Our bound not only leads to an optimal worst-case simple regret bound of up to logarithmic factors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
