Differentially Private Policy Gradient

Alexandre Rio; Merwan Barlier; Igor Colin

arXiv:2501.19080·cs.LG·February 3, 2025

Differentially Private Policy Gradient

Alexandre Rio, Merwan Barlier, Igor Colin

PDF

Open Access

TL;DR

This paper introduces a differentially private policy gradient algorithm for reinforcement learning that balances privacy with performance, avoiding common trade-offs and demonstrating significant empirical improvements over existing methods.

Contribution

The paper presents a novel DP policy gradient method that leverages trust regions to maintain theoretical properties and improve performance in online RL tasks.

Findings

01

Achieves better privacy-performance trade-off

02

Outperforms existing DP algorithms in benchmarks

03

Maintains theoretical properties of RL methods

Abstract

Motivated by the increasing deployment of reinforcement learning in the real world, involving a large consumption of personal data, we introduce a differentially private (DP) policy gradient algorithm. We show that, in this setting, the introduction of Differential Privacy can be reduced to the computation of appropriate trust regions, thus avoiding the sacrifice of theoretical properties of the DP-less methods. Therefore, we show that it is possible to find the right trade-off between privacy noise and trust-region size to obtain a performant differentially private policy gradient algorithm. We then outline its performance empirically on various benchmarks. Our results and the complexity of the tasks addressed represent a significant improvement over existing DP algorithms in online RL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Reinforcement Learning in Robotics