Scalable Multi-Agent Reinforcement Learning with General Utilities
Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei

TL;DR
This paper introduces a scalable distributed policy gradient algorithm for multi-agent reinforcement learning with general utilities, leveraging spatial correlation decay to ensure convergence without full observability.
Contribution
It presents the first scalable MARL algorithm for general utilities that converges efficiently without requiring full observability of all agents.
Findings
Algorithm converges to ε-stationarity with high probability.
Sample complexity is approximately O(ε^{-2}) with respect to the accuracy parameter.
Performance improves exponentially with increased communication radius.
Abstract
We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team. By exploiting the spatial correlation decay property of the network structure, we propose a scalable distributed policy gradient algorithm with shadow reward and localized policy that consists of three steps: (1) shadow reward estimation, (2) truncated shadow Q-function estimation, and (3) truncated policy gradient estimation and policy update. Our algorithm converges, with high probability, to -stationarity with samples up to some approximation error that decreases exponentially in the communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems
