Differentially Private Regret Minimization in Episodic Markov Decision Processes
Sayak Ray Chowdhury, Xingyu Zhou

TL;DR
This paper develops differentially private reinforcement learning algorithms for episodic MDPs, providing regret guarantees under joint and local DP, with minimal privacy-related performance loss.
Contribution
It introduces two frameworks for private RL algorithms with regret guarantees, handling both joint and local differential privacy in finite horizon MDPs.
Findings
Under JDP, privacy incurs a lower order additive regret cost.
Under LDP, privacy incurs a multiplicative regret cost.
The regret bounds are derived via a unified analysis applicable beyond tabular MDPs.
Abstract
We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems, where protecting users' sensitive and private information is becoming paramount. We consider two variants of DP -- joint DP (JDP), where a centralized agent is responsible for protecting users' sensitive data and local DP (LDP), where information needs to be protected directly on the user side. We first propose two general frameworks -- one for policy optimization and another for value iteration -- for designing private, optimistic RL algorithms. We then instantiate these frameworks with suitable privacy mechanisms to satisfy JDP and LDP requirements, and simultaneously obtain sublinear regret guarantees. The regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization · Vehicular Ad Hoc Networks (VANETs)
