Distributed primal-dual algorithm for constrained multi-agent reinforcement learning under coupled policies

Pengcheng Dai; He Wang; Dongming Wang; Wenwu Yu

arXiv:2511.15053·cs.MA·November 20, 2025

Distributed primal-dual algorithm for constrained multi-agent reinforcement learning under coupled policies

Pengcheng Dai, He Wang, Dongming Wang, Wenwu Yu

PDF

Open Access

TL;DR

This paper introduces a distributed primal-dual algorithm for constrained multi-agent reinforcement learning with coupled policies, ensuring agents optimize objectives while respecting safety constraints through local estimates and limited communication.

Contribution

It proposes a novel distributed primal-dual framework for CMARL with coupled policies, incorporating local estimates and neighborhood communication to enhance security and scalability.

Findings

01

Achieves $oldsymbol{ extit{ ext{epsilon}}}$-first-order stationarity with high probability.

02

Provides convergence bounds with approximation error depending on coupling and truncation distances.

03

Demonstrates effectiveness through simulations in GridWorld environment.

Abstract

In this work, we investigate constrained multi-agent reinforcement learning (CMARL), where agents collaboratively maximize the sum of their local objectives while satisfying individual safety constraints. We propose a framework where agents adopt coupled policies that depend on both local states and parameters, as well as those of their $κ_{p}$ -hop neighbors, with $κ_{p} > 0$ denoting the coupling distance. A distributed primal-dual algorithm is further developed under this framework, wherein each agent has access only to state-action pairs within its $2 κ_{p}$ -hop neighborhood and to reward information within its $κ + 2 κ_{p}$ -hop neighborhood, with $κ > 0$ representing the truncation distance. Moreover, agents are not permitted to directly share their true policy parameters or Lagrange multipliers. Instead, each agent constructs and maintains local estimates of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Security and Resilience