Algorithms for Differentially Private Multi-Armed Bandits

Aristide Tossou; Christos Dimitrakakis

arXiv:1511.08681·stat.ML·November 30, 2015

Algorithms for Differentially Private Multi-Armed Bandits

Aristide Tossou, Christos Dimitrakakis

PDF

TL;DR

This paper introduces differentially private algorithms for stochastic multi-armed bandits that achieve optimal regret bounds, improving upon previous methods through a novel interval-based mechanism and validated by experiments.

Contribution

It presents the first differentially private UCB algorithms with optimal regret bounds, utilizing a new interval-based mechanism for enhanced privacy-utility trade-offs.

Findings

01

Achieved $(\epsilon,\delta)$-DP algorithms with $O(rac{1}{\epsilon} + \log T)$ regret.

02

Significant improvement over previous poly-log regret bounds.

03

Experimental results confirm theoretical guarantees.

Abstract

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist $(ϵ, δ)$ differentially private variants of Upper Confidence Bound algorithms which have optimal regret, $O (ϵ^{- 1} + lo g T)$ . This is a significant improvement over previous results, which only achieve poly-log regret $O (ϵ^{- 2} lo g^{2} T)$ , because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.