Near-Optimal Reinforcement Learning with Shuffle Differential Privacy

Shaojie Bai; Mohammad Sadegh Talebi; Chengcheng Zhao; Peng Cheng; and Jiming Chen

arXiv:2411.11647·cs.LG·November 18, 2025

Near-Optimal Reinforcement Learning with Shuffle Differential Privacy

Shaojie Bai, Mohammad Sadegh Talebi, Chengcheng Zhao, Peng Cheng, and Jiming Chen

PDF

Open Access

TL;DR

This paper introduces SDP-PE, a novel reinforcement learning algorithm under the shuffle differential privacy model, achieving near-optimal regret bounds and balancing privacy with learning efficiency in networked systems.

Contribution

It presents the first policy elimination-based RL algorithm under the shuffle model, combining a new batching schedule and forgetting mechanism for improved privacy-utility trade-offs.

Findings

01

Achieves near-optimal regret bounds under shuffle DP.

02

Outperforms local DP models in utility while maintaining strong privacy.

03

Numerical experiments validate theoretical guarantees.

Abstract

Reinforcement learning (RL) is a powerful tool for sequential decision-making, but its application is often hindered by privacy concerns arising from its interaction data. This challenge is particularly acute in advanced networked systems, where learning from operational and user data can expose systems to privacy inference attacks. Existing differential privacy (DP) models for RL are often inadequate: the centralized model requires a fully trusted server, creating a single point of failure risk, while the local model incurs significant performance degradation that is unsuitable for many networked applications. This paper addresses this gap by leveraging the emerging shuffle model of privacy, an intermediate trust model that provides strong privacy guarantees without a centralized trust assumption. We present Shuffle Differentially Private Policy Elimination (SDP-PE), the first generic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning