Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization
Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li

TL;DR
This paper introduces a differentially private temporal difference learning algorithm with nonlinear value function approximation, balancing privacy and utility in reinforcement learning through a momentum-based stochastic gradient method.
Contribution
It develops a novel single-timescale algorithm for private TD learning with nonlinear approximation, ensuring differential privacy on both primal and dual variables.
Findings
Achieves $(,\u03b4)$-DP guarantees for sensitive data.
Provides utility bounds of (rac{(d\, ext{log}(1/\u03b4))^{1/8}}{(n)}^{1/4})
Demonstrates effectiveness through experiments in OpenAI Gym.
Abstract
Temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning. While many TD learning methods have been developed in recent years, little attention has been paid to preserving privacy and most of the existing approaches might face the concerns of data privacy from users. To enable complex representative abilities of policies, in this paper, we consider preserving privacy in TD learning with nonlinear value function approximation. This is challenging because such a nonlinear problem is usually studied in the formulation of stochastic nonconvex-strongly-concave optimization to gain finite-sample analysis, which would require simultaneously preserving the privacy on primal and dual sides. To this end, we employ a momentum-based stochastic gradient descent ascent to achieve a single-timescale algorithm, and achieve a good trade-off between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
