Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Yan Zhang, Michael M. Zavlanos

TL;DR
This paper introduces a distributed off-policy actor-critic reinforcement learning algorithm where multiple agents independently estimate the global policy and use consensus steps to achieve agreement, validated through convergence analysis and a resource allocation example.
Contribution
It presents a novel distributed off-policy actor-critic method with a consensus mechanism for multi-agent reinforcement learning, without sharing local task information.
Findings
Convergence of the proposed algorithm is theoretically established.
The method effectively coordinates agents to estimate a global policy.
Validated through a distributed resource allocation case study.
Abstract
In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Distributed Control Multi-Agent Systems
