Distributed off-Policy Actor-Critic Reinforcement Learning with Policy   Consensus

Yan Zhang; Michael M. Zavlanos

arXiv:1903.09255·cs.LG·March 25, 2019·1 cites

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

Yan Zhang, Michael M. Zavlanos

PDF

Open Access

TL;DR

This paper introduces a distributed off-policy actor-critic reinforcement learning algorithm where multiple agents independently estimate the global policy and use consensus steps to achieve agreement, validated through convergence analysis and a resource allocation example.

Contribution

It presents a novel distributed off-policy actor-critic method with a consensus mechanism for multi-agent reinforcement learning, without sharing local task information.

Findings

01

Convergence of the proposed algorithm is theoretically established.

02

The method effectively coordinates agents to estimate a global policy.

03

Validated through a distributed resource allocation case study.

Abstract

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Distributed Control Multi-Agent Systems