Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems
Xin Liu, Honghao Wei, Lei Ying

TL;DR
This paper introduces a scalable, distributed policy gradient algorithm for REC-MARL, a class of multi-agent reinforcement learning problems with reward coupling, demonstrating superior performance in wireless network applications.
Contribution
It proposes a novel distributed policy gradient method tailored for REC-MARL, enabling efficient learning with theoretical complexity bounds and practical effectiveness.
Findings
Outperforms existing algorithms in wireless network tasks
Achieves stationary policies with complexity depending on local state/action dimensions
Demonstrates significant improvements in real-time access and power control
Abstract
This paper studies a class of multi-agent reinforcement learning (MARL) problems where the reward that an agent receives depends on the states of other agents, but the next state only depends on the agent's own current state and action. We name it REC-MARL standing for REward-Coupled Multi-Agent Reinforcement Learning. REC-MARL has a range of important applications such as real-time access control and distributed power control in wireless networks. This paper presents a distributed policy gradient algorithm for REC-MARL. The proposed algorithm is distributed in two aspects: (i) the learned policy is a distributed policy that maps a local state of an agent to its local action and (ii) the learning/training is distributed, during which each agent updates its policy based on its own and neighbors' information. The learned algorithm achieves a stationary policy and its iterative complexity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Networks and Protocols · Energy Harvesting in Wireless Networks
