Distributed TD Tracking with Linear Function Approximation over Directed Communication Networks
Haocheng Yang, Shengchao Zhao, Yongchao Liu

TL;DR
This paper introduces PP-DTD, a novel distributed policy evaluation algorithm for multi-agent reinforcement learning over directed networks, achieving fast convergence and robustness.
Contribution
It presents the first distributed TD-based policy evaluation algorithm for directed graphs with proven linear convergence rates.
Findings
PP-DTD achieves linear convergence to a neighborhood of the optimum.
The algorithm demonstrates a convergence rate of O(T^{-1}) with decaying step-sizes.
Numerical experiments show robustness and effectiveness in cooperative tasks.
Abstract
We study the policy evaluation problem in multi-agent reinforcement learning (MARL) over directed communication networks, where agents cooperate with each other to explore an unknown environment and accomplish a specific task. We propose a Push-Pull-type distributed algorithm, named PP-DTD, for policy evaluation in MARL within the framework of temporal difference (TD) learning with linear function approximation. PP-DTD integrates TD learning with the Push-Pull mechanism to accommodate directed communication networks, and further utilizes variance reduction techniques to enhance both algorithmic stability and convergence rate. We show that PP-DTD achieves linear convergence to a neighborhood of the optimum under constant step-sizes and a convergence rate of under decaying step-sizes when the sample is independent and identically distributed or Markovian. To the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
