Differentially Private Regret Minimization in Episodic Markov Decision   Processes

Sayak Ray Chowdhury; Xingyu Zhou

arXiv:2112.10599·cs.LG·December 21, 2021

Differentially Private Regret Minimization in Episodic Markov Decision Processes

Sayak Ray Chowdhury, Xingyu Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper develops differentially private reinforcement learning algorithms for episodic MDPs, providing regret guarantees under joint and local DP, with minimal privacy-related performance loss.

Contribution

It introduces two frameworks for private RL algorithms with regret guarantees, handling both joint and local differential privacy in finite horizon MDPs.

Findings

01

Under JDP, privacy incurs a lower order additive regret cost.

02

Under LDP, privacy incurs a multiplicative regret cost.

03

The regret bounds are derived via a unified analysis applicable beyond tabular MDPs.

Abstract

We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems, where protecting users' sensitive and private information is becoming paramount. We consider two variants of DP -- joint DP (JDP), where a centralized agent is responsible for protecting users' sensitive data and local DP (LDP), where information needs to be protected directly on the user side. We first propose two general frameworks -- one for policy optimization and another for value iteration -- for designing private, optimistic RL algorithms. We then instantiate these frameworks with suitable privacy mechanisms to satisfy JDP and LDP requirements, and simultaneously obtain sublinear regret guarantees. The regret…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xingyuzhou989/privatetabularrl
noneOfficial

Videos

Differentially Private Regret Minimization in Episodic Markov Decision Processes· underline

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization · Vehicular Ad Hoc Networks (VANETs)