Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD
Donghwan Lee, Hyungjin Yoon, and Naira Hovakimyan

TL;DR
This paper introduces a primal-dual distributed GTD algorithm for multi-agent reinforcement learning, enabling agents to collaboratively learn a global value function through sparse communication.
Contribution
It develops a novel primal-dual approach for distributed GTD, converting the problem into a constrained convex optimization and proving convergence.
Findings
Algorithm converges almost surely to stationary points.
Enables multi-agent RL with sparse communication.
Learns global value function from local rewards.
Abstract
The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal difference (TD) learning is a reinforcement learning (RL) algorithm which learns an infinite horizon discounted cost function (or value function) for a given fixed policy without the model knowledge. In the distributed RL case each agent receives local reward through a local processing. Information exchange over sparse communication network allows the agents to learn the global value function corresponding to a global reward, which is a sum of local rewards. In this paper, the problem is converted into a constrained convex optimization problem with a consensus constraint. Then, we propose a primal-dual distributed GTD algorithm and prove that it almost surely converges to a set of stationary points of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
