Value Propagation for Decentralized Networked Deep Multi-agent   Reinforcement Learning

Chao Qu; Shie Mannor; Huan Xu; Yuan Qi; Le Song; Junwu Xiong

arXiv:1901.09326·cs.LG·October 1, 2019·27 cites

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong

PDF

Open Access

TL;DR

This paper introduces a novel decentralized multi-agent reinforcement learning algorithm called value propagation, which efficiently learns coordinated policies in networked environments with local rewards and limited communication, with proven convergence guarantees.

Contribution

It presents the first MARL algorithm with convergence guarantees in control, off-policy, and nonlinear function approximation settings, using a decentralized optimization approach.

Findings

01

Algorithm achieves a convergence rate of 1/T.

02

Empirical results demonstrate effectiveness in networked multi-agent scenarios.

03

First to provide convergence guarantees under these conditions.

Abstract

We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve the joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume that the reward function for each agent can be different and observed only locally by the agent itself. Furthermore, each agent is located at a node of a communication network and can exchanges information only with its neighbors. Using softmax temporal consistency and a decentralized optimization method, we obtain a principled and data-efficient iterative algorithm. In the first step of each iteration, an agent computes its local policy and value gradients and then updates only policy parameters. In the second step, the agent propagates to its neighbors the messages based on its value function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Control Multi-Agent Systems · Age of Information Optimization · Reinforcement Learning in Robotics

MethodsSoftmax