Differentially Private Exploration in Reinforcement Learning with Linear   Representation

Paul Luyo; Evrard Garcelon; Alessandro Lazaric; Matteo; Pirotta

arXiv:2112.01585·cs.LG·December 8, 2021·1 cites

Differentially Private Exploration in Reinforcement Learning with Linear Representation

Paul Luyo, Evrard Garcelon, Alessandro Lazaric, Matteo, Pirotta

PDF

Open Access

TL;DR

This paper develops differentially private exploration algorithms for linear MDPs, providing regret bounds for both model-based and model-free settings, advancing privacy-preserving reinforcement learning.

Contribution

It introduces a unified framework for private exploration in linear MDPs and proposes new algorithms with theoretical regret guarantees for both joint and local differential privacy.

Findings

01

Achieved $ ilde{O}(K^{3/4}/\sqrt{\epsilon})$ regret for local DP in model-based setting.

02

Established $ ilde{O}(\sqrt{K/\epsilon})$ regret for joint DP in model-based setting.

03

Proposed a low-switching algorithm with $ ilde{O}(K^{3/5}/\epsilon^{2/5})$ regret for model-free setting.

Abstract

This paper studies privacy-preserving exploration in Markov Decision Processes (MDPs) with linear representation. We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration. Through this framework, we prove a $O (K^{3/4} / ϵ)$ regret bound for $(ϵ, δ)$ -local DP exploration and a $O (K / ϵ)$ regret bound for $(ϵ, δ)$ -joint DP. We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a.\ model-free setting) where we provide a $O (K^{\frac{3}{5}} / ϵ^{\frac{2}{5}})$ regret bound for $(ϵ, δ)$ -joint DP, with a novel algorithm based on low-switching. Finally, we provide insights into the issues of designing local DP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization