Differentially Private Temporal Difference Learning with Stochastic   Nonconvex-Strongly-Concave Optimization

Canzhe Zhao; Yanjie Ze; Jing Dong; Baoxiang Wang; Shuai Li

arXiv:2201.10447·cs.LG·January 26, 2022

Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization

Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

PDF

Open Access

TL;DR

This paper introduces a differentially private temporal difference learning algorithm with nonlinear value function approximation, balancing privacy and utility in reinforcement learning through a momentum-based stochastic gradient method.

Contribution

It develops a novel single-timescale algorithm for private TD learning with nonlinear approximation, ensuring differential privacy on both primal and dual variables.

Findings

01

Achieves $(,\u03b4)$-DP guarantees for sensitive data.

02

Provides utility bounds of (rac{(d\, ext{log}(1/\u03b4))^{1/8}}{(n)}^{1/4})

03

Demonstrates effectiveness through experiments in OpenAI Gym.

Abstract

Temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning. While many TD learning methods have been developed in recent years, little attention has been paid to preserving privacy and most of the existing approaches might face the concerns of data privacy from users. To enable complex representative abilities of policies, in this paper, we consider preserving privacy in TD learning with nonlinear value function approximation. This is challenging because such a nonlinear problem is usually studied in the formulation of stochastic nonconvex-strongly-concave optimization to gain finite-sample analysis, which would require simultaneously preserving the privacy on primal and dual sides. To this end, we employ a momentum-based stochastic gradient descent ascent to achieve a single-timescale algorithm, and achieve a good trade-off between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data