Improved Regret for Differentially Private Exploration in Linear MDP
Dung Daniel Ngo, Giuseppe Vietri, Zhiwei Steven Wu

TL;DR
This paper introduces a differentially private reinforcement learning algorithm for linear MDPs that achieves an optimal regret rate of O(√K), significantly improving over previous methods by reducing privacy noise and update frequency.
Contribution
The authors develop a private RL algorithm with an adaptive policy update schedule, leading to optimal regret bounds and minimal privacy cost in linear MDP settings.
Findings
Achieves O(√K) regret rate, optimal for the setting.
Reduces privacy noise by limiting the number of policy updates to O(log(K)).
In privacy regimes with constant ε, privacy costs are negligible compared to non-private bounds.
Abstract
We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to Luyo et al. (2021) achieves a regret rate that has a dependence of on the number of episodes . We provide a private algorithm with an improved regret rate with an optimal dependence of on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization · Advanced Bandit Algorithms Research
