Near-Optimal Differentially Private Reinforcement Learning

Dan Qiao; Yu-Xiang Wang

arXiv:2212.04680·cs.LG·February 23, 2023

Near-Optimal Differentially Private Reinforcement Learning

Dan Qiao, Yu-Xiang Wang

PDF

Open Access

TL;DR

This paper introduces a near-optimal differentially private reinforcement learning algorithm that achieves asymptotically optimal regret, matching non-private lower bounds, and introduces new techniques for privacy-preserving data release.

Contribution

It presents the first private RL algorithm with asymptotically optimal regret, achieving privacy for free as data size grows, and introduces novel privacy-preserving techniques.

Findings

01

Achieves regret matching non-private lower bounds for JDP.

02

Introduces new methods for privately releasing exploration bonuses.

03

Provides improved regret bounds for LDP case.

Abstract

Motivated by personalized healthcare and other applications involving sensitive data, we study online exploration in reinforcement learning with differential privacy (DP) constraints. Existing work on this problem established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP) but did not provide an algorithm with optimal regret. We close this gap for the JDP case by designing an $ϵ$ -JDP algorithm with a regret of $O (S A H^{2} T + S^{2} A H^{3} / ϵ)$ which matches the information-theoretic lower bound of non-private learning for all choices of $ϵ > S^{1.5} A^{0.5} H^{2} / T$ . In the above, $S$ , $A$ denote the number of states and actions, $H$ denotes the planning horizon, and $T$ is the number of steps. To the best of our knowledge, this is the first private RL algorithm that achieves \emph{privacy for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization · Advanced Bandit Algorithms Research