Scalable Multi-Agent Reinforcement Learning with General Utilities

Donghao Ying; Yuhao Ding; Alec Koppel; Javad Lavaei

arXiv:2302.07938·cs.LG·August 29, 2023

Scalable Multi-Agent Reinforcement Learning with General Utilities

Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei

PDF

Open Access

TL;DR

This paper introduces a scalable distributed policy gradient algorithm for multi-agent reinforcement learning with general utilities, leveraging spatial correlation decay to ensure convergence without full observability.

Contribution

It presents the first scalable MARL algorithm for general utilities that converges efficiently without requiring full observability of all agents.

Findings

01

Algorithm converges to ε-stationarity with high probability.

02

Sample complexity is approximately O(ε^{-2}) with respect to the accuracy parameter.

03

Performance improves exponentially with increased communication radius.

Abstract

We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team. By exploiting the spatial correlation decay property of the network structure, we propose a scalable distributed policy gradient algorithm with shadow reward and localized policy that consists of three steps: (1) shadow reward estimation, (2) truncated shadow Q-function estimation, and (3) truncated policy gradient estimation and policy update. Our algorithm converges, with high probability, to $ϵ$ -stationarity with $O (ϵ^{- 2})$ samples up to some approximation error that decreases exponentially in the communication…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems