Differentially Private Exploration in Reinforcement Learning with Linear Representation
Paul Luyo, Evrard Garcelon, Alessandro Lazaric, Matteo, Pirotta

TL;DR
This paper develops differentially private exploration algorithms for linear MDPs, providing regret bounds for both model-based and model-free settings, advancing privacy-preserving reinforcement learning.
Contribution
It introduces a unified framework for private exploration in linear MDPs and proposes new algorithms with theoretical regret guarantees for both joint and local differential privacy.
Findings
Achieved $ ilde{O}(K^{3/4}/\sqrt{\epsilon})$ regret for local DP in model-based setting.
Established $ ilde{O}(\sqrt{K/\epsilon})$ regret for joint DP in model-based setting.
Proposed a low-switching algorithm with $ ilde{O}(K^{3/5}/\epsilon^{2/5})$ regret for model-free setting.
Abstract
This paper studies privacy-preserving exploration in Markov Decision Processes (MDPs) with linear representation. We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration. Through this framework, we prove a regret bound for -local DP exploration and a regret bound for -joint DP. We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a.\ model-free setting) where we provide a regret bound for -joint DP, with a novel algorithm based on low-switching. Finally, we provide insights into the issues of designing local DP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
