Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning
Pengcheng Dai, Dongming Wang, Wenwu Yu, Wei Ren

TL;DR
This paper introduces a distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning, enabling agents to optimize policies collaboratively with limited local information and neighbor interactions, ensuring convergence and improved performance.
Contribution
The paper develops a novel distributed algorithm for coupled policy optimization in NMARL, utilizing neighbor-averaged Q-functions and a geometric 2-horizon sampling method for unbiased gradient estimation.
Findings
Algorithm converges to a first-order stationary point.
Demonstrates improved performance in robot path planning simulations.
Requires only local neighbor information for policy updates.
Abstract
This paper studies networked multi-agent reinforcement learning (NMARL) with interdependent rewards and coupled policies. In this setting, each agent's reward depends on its own state-action pair as well as those of its direct neighbors, and each agent's policy is parameterized by its local parameters together with those of its -hop neighbors, with denoting the coupled radius. The objective of the agents is to collaboratively optimize their policies to maximize the discounted average cumulative reward. To address the challenge of interdependent policies in collaborative optimization, we introduce a novel concept termed the neighbors' averaged -function and derive a new expression for the coupled policy gradient. Based on these theoretical foundations, we develop a distributed scalable coupled policy (DSCP) algorithm, where each agent relies only on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems
