Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
Guannan Qu, Yiheng Lin, Adam Wierman, Na Li

TL;DR
This paper introduces a scalable actor-critic method for multi-agent reinforcement learning in networked systems, leveraging local dependence and exponential decay properties to efficiently optimize average reward.
Contribution
It proposes a novel scalable MARL algorithm that exploits local structure and decay properties, enabling efficient learning in large networked systems.
Findings
The SAC method achieves near-optimal policies with complexity depending on local neighborhoods.
Exponential decay ensures agents' influence diminishes rapidly with graph distance.
The approach significantly improves scalability over traditional MARL methods.
Abstract
It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Distributed Control Multi-Agent Systems · Adaptive Dynamic Programming Control
MethodsExponential Decay
