Improved Regret for Differentially Private Exploration in Linear MDP

Dung Daniel Ngo; Giuseppe Vietri; Zhiwei Steven Wu

arXiv:2202.01292·cs.LG·June 24, 2022

Improved Regret for Differentially Private Exploration in Linear MDP

Dung Daniel Ngo, Giuseppe Vietri, Zhiwei Steven Wu

PDF

Open Access

TL;DR

This paper introduces a differentially private reinforcement learning algorithm for linear MDPs that achieves an optimal regret rate of O(√K), significantly improving over previous methods by reducing privacy noise and update frequency.

Contribution

The authors develop a private RL algorithm with an adaptive policy update schedule, leading to optimal regret bounds and minimal privacy cost in linear MDP settings.

Findings

01

Achieves O(√K) regret rate, optimal for the setting.

02

Reduces privacy noise by limiting the number of policy updates to O(log(K)).

03

In privacy regimes with constant ε, privacy costs are negligible compared to non-private bounds.

Abstract

We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to Luyo et al. (2021) achieves a regret rate that has a dependence of $O (K^{3/5})$ on the number of episodes $K$ . We provide a private algorithm with an improved regret rate with an optimal dependence of $O (K)$ on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization · Advanced Bandit Algorithms Research