Revisiting Simple Regret: Fast Rates for Returning a Good Arm

Yao Zhao; Connor James Stephens; Csaba Szepesv\'ari; Kwang-Sung Jun

arXiv:2210.16913·cs.LG·February 3, 2023

Revisiting Simple Regret: Fast Rates for Returning a Good Arm

Yao Zhao, Connor James Stephens, Csaba Szepesv\'ari, Kwang-Sung Jun

PDF

Open Access 1 Video

TL;DR

This paper advances the understanding of simple regret minimization in multi-armed bandits by providing improved bounds for the Sequential Halving algorithm and proposing a new method, Bracketing SH, that performs well even with limited data.

Contribution

The paper offers an improved instance-dependent analysis of Sequential Halving and introduces Bracketing SH, a new algorithm effective in data-poor regimes for simple regret minimization.

Findings

01

Optimal worst-case simple regret bound of √(n/T) up to logs.

02

Matching instance-dependent lower bounds for ε-good arm identification.

03

Bracketing SH outperforms existing methods on real-world tasks.

Abstract

Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $ϵ$ -good arm, perhaps due to lack of easy ways to characterize it. In this paper, we make significant progress on minimizing simple regret in both data-rich ( $T \geq n$ ) and data-poor regime ( $T \leq n$ ) where $n$ is the number of arms, and $T$ is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm, where we bound the probability of returning an arm whose mean reward is not within $ϵ$ from the best (i.e., not $ϵ$ -good) for \textit{any} choice of $ϵ > 0$ , although $ϵ$ is not an input to SH. Our bound not only leads to an optimal worst-case simple regret bound of $n / T$ up to logarithmic factors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Revisiting Simple Regret: Fast Rates for Returning a Good Arm· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques