Optimal Regret of Bernoulli Bandits under Global Differential Privacy
Achraf Azize, Yulian Wu, Junya Honda, Francesco Orabona, Shinji Ito, Debabrota Basu

TL;DR
This paper refines the theoretical understanding of regret bounds in Bernoulli bandits under global differential privacy, proposing optimal algorithms and a new concentration inequality that advances privacy-preserving sequential learning.
Contribution
It introduces a tighter regret lower bound, develops asymptotically optimal DP algorithms, and presents a novel concentration inequality for Bernoulli sums with Laplace noise.
Findings
The new lower bound improves existing regret bounds across all privacy levels.
DP-KLUCB and DP-IMED algorithms asymptotically match the lower bound, proving optimality.
A new concentration inequality for Bernoulli sums under Laplace noise enhances analysis.
Abstract
As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under -global Differential Privacy (DP) has been widely studied. Unlike bandits without DP, there is a significant gap between the best-known regret lower and upper bound in this setting, though they "match" in order. Thus, we revisit the regret lower and upper bounds of -global DP algorithms for Bernoulli bandits and improve both. First, we prove a tighter regret lower bound involving a novel information-theoretic quantity characterising the hardness of -global DP in stochastic bandits. Our lower bound strictly improves on the existing ones across all values. Then, we choose two asymptotically optimal bandit algorithms, i.e.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
