Thompson Sampling under Bernoulli Rewards with Local Differential   Privacy

Bo Jiang; Tianchi Zhao; Ming Li

arXiv:2307.00863·cs.LG·July 4, 2023

Thompson Sampling under Bernoulli Rewards with Local Differential Privacy

Bo Jiang, Tianchi Zhao, Ming Li

PDF

Open Access

TL;DR

This paper studies how to minimize regret in Bernoulli multi-armed bandit problems while ensuring local differential privacy, analyzing three mechanisms and their impact on Thompson Sampling's performance.

Contribution

It introduces and analyzes three local differential privacy mechanisms for Bernoulli bandits, deriving regret bounds for Thompson Sampling under each mechanism.

Findings

01

Quadratic mechanism achieves lower regret than linear under certain privacy budgets.

02

Exponential mechanism provides a balance between privacy and regret.

03

Simulations confirm theoretical regret bounds and convergence behaviors.

Abstract

This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. Given a fixed privacy budget $ϵ$ , we consider three privatizing mechanisms under Bernoulli scenario: linear, quadratic and exponential mechanisms. Under each mechanism, we derive stochastic regret bound for Thompson Sampling algorithm. Finally, we simulate to illustrate the convergence of different mechanisms under different privacy budgets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques