Algorithms for Differentially Private Multi-Armed Bandits
Aristide Tossou, Christos Dimitrakakis

TL;DR
This paper introduces differentially private algorithms for stochastic multi-armed bandits that achieve optimal regret bounds, improving upon previous methods through a novel interval-based mechanism and validated by experiments.
Contribution
It presents the first differentially private UCB algorithms with optimal regret bounds, utilizing a new interval-based mechanism for enhanced privacy-utility trade-offs.
Findings
Achieved $(\epsilon,\delta)$-DP algorithms with $O(rac{1}{\epsilon} + \log T)$ regret.
Significant improvement over previous poly-log regret bounds.
Experimental results confirm theoretical guarantees.
Abstract
We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist differentially private variants of Upper Confidence Bound algorithms which have optimal regret, . This is a significant improvement over previous results, which only achieve poly-log regret , because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
