Thompson Sampling under Bernoulli Rewards with Local Differential Privacy
Bo Jiang, Tianchi Zhao, Ming Li

TL;DR
This paper studies how to minimize regret in Bernoulli multi-armed bandit problems while ensuring local differential privacy, analyzing three mechanisms and their impact on Thompson Sampling's performance.
Contribution
It introduces and analyzes three local differential privacy mechanisms for Bernoulli bandits, deriving regret bounds for Thompson Sampling under each mechanism.
Findings
Quadratic mechanism achieves lower regret than linear under certain privacy budgets.
Exponential mechanism provides a balance between privacy and regret.
Simulations confirm theoretical regret bounds and convergence behaviors.
Abstract
This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget , we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
