Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning

Pengcheng Dai; Dongming Wang; Wenwu Yu; Wei Ren

arXiv:2512.05447·cs.MA·December 11, 2025

Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning

Pengcheng Dai, Dongming Wang, Wenwu Yu, Wei Ren

PDF

Open Access

TL;DR

This paper introduces a distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning, enabling agents to optimize policies collaboratively with limited local information and neighbor interactions, ensuring convergence and improved performance.

Contribution

The paper develops a novel distributed algorithm for coupled policy optimization in NMARL, utilizing neighbor-averaged Q-functions and a geometric 2-horizon sampling method for unbiased gradient estimation.

Findings

01

Algorithm converges to a first-order stationary point.

02

Demonstrates improved performance in robot path planning simulations.

03

Requires only local neighbor information for policy updates.

Abstract

This paper studies networked multi-agent reinforcement learning (NMARL) with interdependent rewards and coupled policies. In this setting, each agent's reward depends on its own state-action pair as well as those of its direct neighbors, and each agent's policy is parameterized by its local parameters together with those of its $κ_{p}$ -hop neighbors, with $κ_{p} \geq 1$ denoting the coupled radius. The objective of the agents is to collaboratively optimize their policies to maximize the discounted average cumulative reward. To address the challenge of interdependent policies in collaborative optimization, we introduce a novel concept termed the neighbors' averaged $Q$ -function and derive a new expression for the coupled policy gradient. Based on these theoretical foundations, we develop a distributed scalable coupled policy (DSCP) algorithm, where each agent relies only on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems