Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a   Differentially Private Scheme

Kontantinos E. Nikolakakis; Dionysios S. Kalogerias; Or Sheffet and; Anand D. Sarwate

arXiv:2006.06792·stat.ML·December 6, 2022

Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme

Kontantinos E. Nikolakakis, Dionysios S. Kalogerias, Or Sheffet and, Anand D. Sarwate

PDF

TL;DR

This paper introduces optimal algorithms for identifying the best arm based on quantiles in stochastic multi-armed bandits, including a differentially private method suitable for private reward settings, with proven optimality and finite sample complexity.

Contribution

It presents a novel successive elimination algorithm for quantile-based best-arm identification and a differentially private variant with finite sample complexity, both with theoretical guarantees.

Findings

01

The non-private algorithm is $ ext{δ}$-PAC and nearly optimal in sample complexity.

02

The differentially private algorithm maintains finite sample complexity even with infinite support distributions.

03

Both algorithms do not require prior knowledge of the suboptimality gap or statistical parameters.

Abstract

We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is $δ$ -PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.