Primal-Dual Algorithm for Distributed Reinforcement Learning:   Distributed GTD

Donghwan Lee; Hyungjin Yoon; and Naira Hovakimyan

arXiv:1803.08031·math.OC·August 23, 2018·CDC

Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

Donghwan Lee, Hyungjin Yoon, and Naira Hovakimyan

PDF

TL;DR

This paper introduces a primal-dual distributed GTD algorithm for multi-agent reinforcement learning, enabling agents to collaboratively learn a global value function through sparse communication.

Contribution

It develops a novel primal-dual approach for distributed GTD, converting the problem into a constrained convex optimization and proving convergence.

Findings

01

Algorithm converges almost surely to stationary points.

02

Enables multi-agent RL with sparse communication.

03

Learns global value function from local rewards.

Abstract

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal difference (TD) learning is a reinforcement learning (RL) algorithm which learns an infinite horizon discounted cost function (or value function) for a given fixed policy without the model knowledge. In the distributed RL case each agent receives local reward through a local processing. Information exchange over sparse communication network allows the agents to learn the global value function corresponding to a global reward, which is a sum of local rewards. In this paper, the problem is converted into a constrained convex optimization problem with a consensus constraint. Then, we propose a primal-dual distributed GTD algorithm and prove that it almost surely converges to a set of stationary points of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.