Differentially Private Reinforcement Learning with Linear Function Approximation
Xingyu Zhou

TL;DR
This paper introduces privacy-preserving reinforcement learning algorithms for large-scale MDPs with linear function approximation, achieving sub-linear regret while protecting user data under joint differential privacy.
Contribution
It develops the first private RL algorithms for large state-action spaces with linear approximation, ensuring privacy with regret bounds independent of state space size.
Findings
Algorithms achieve sub-linear regret with privacy guarantees.
Regret bounds scale logarithmically with the number of actions.
Methods are suitable for large-scale personalized services.
Abstract
Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services, where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP). Compared to existing private RL algorithms that work only on tabular finite-state, finite-actions MDPs, we take the first step towards privacy-preserving learning in MDPs with large state and action spaces. Specifically, we consider MDPs with linear function approximation (in particular linear mixture MDPs) under the notion of joint differential privacy (JDP), where the RL agent is responsible for protecting users' sensitive data. We design two private RL algorithms that are based on value iteration and policy optimization, respectively, and show that they enjoy sub-linear regret performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization · Vehicular Ad Hoc Networks (VANETs)
