Differential Privacy in the Extensive-Form Bandit Problem
Stephen Pasteris, Rahul Savani, Theodore Turocy

TL;DR
This paper introduces a differentially private algorithm for the extensive-form bandit problem, achieving low regret while preserving local differential privacy, a novel contribution in this domain.
Contribution
It presents the first study of differential privacy in the extensive-form bandit setting, with an algorithm that balances privacy, regret, and computational efficiency.
Findings
Achieves regret of O(\u221a{A \u22ef (S) T}/) under -local differential privacy.
Algorithm's time complexity is comparable to transmitting the reduced strategy.
First work to address differential privacy in the extensive-form bandit problem.
Abstract
We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss. We give an algorithm for this problem that satisfies -local differential privacy and attains a regret of , where is the total number of actions that the learner can possibly take, is the number of the learner's possible reduced strategies, and is the number of trials. On each trial, the time complexity of our algorithm is, up to a factor logarithmic in the maximum number of actions at an infoset, equal to the time required for the server to transmit the reduced strategy to the user. We note that local differential privacy is the strongest version of differential privacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
