Concentrated Differential Privacy for Bandits
Achraf Azize, Debabrota Basu

TL;DR
This paper introduces new algorithms for differentially private bandit learning, demonstrating that privacy constraints have negligible impact on regret and providing both upper and lower bounds with experimental validation.
Contribution
It formalizes DP adaptations for bandits, proposes three private algorithms with a generic blueprint, and establishes regret bounds and minimax lower bounds under zCDP.
Findings
Privacy costs are asymptotically negligible compared to non-private regret.
Proposed algorithms achieve a good privacy-utility trade-off.
First minimax lower bounds for bandits with zCDP are established.
Abstract
Bandits serve as the theoretical foundation of sequential learning and an algorithmic foundation of modern recommender systems. However, recommender systems often rely on user-sensitive data, making privacy a critical concern. This paper contributes to the understanding of Differential Privacy (DP) in bandits with a trusted centralised decision-maker, and especially the implications of ensuring zero Concentrated Differential Privacy (zCDP). First, we formalise and compare different adaptations of DP to bandits, depending on the considered input and the interaction protocol. Then, we propose three private algorithms, namely AdaC-UCB, AdaC-GOPE and AdaC-OFUL, for three bandit settings, namely finite-armed bandits, linear bandits, and linear contextual bandits. The three algorithms share a generic algorithmic blueprint, i.e. the Gaussian mechanism and adaptive episodes, to ensure a good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Bandit Algorithms Research · Age of Information Optimization
