Distributed Multi-Agent Reinforcement Learning with One-hop Neighbors and Compute Straggler Mitigation
Baoqian Wang, Junfei Xie, Nikolay Atanasov

TL;DR
This paper introduces DARL1N, a scalable multi-agent reinforcement learning method that restricts communication to one-hop neighbors, combined with a coded distributed architecture to mitigate stragglers, significantly reducing training time and improving efficiency.
Contribution
The paper presents DARL1N, a novel off-policy actor-critic MARL algorithm with one-hop neighbor communication, and a coded distributed learning framework to handle stragglers, enhancing scalability and robustness.
Findings
DARL1N reduces training time significantly.
The method maintains policy quality with increasing agents.
Coded architecture improves resilience to stragglers.
Abstract
Most multi-agent reinforcement learning (MARL) methods are limited in the scale of problems they can handle. With increasing numbers of agents, the number of training iterations required to find the optimal behaviors increases exponentially due to the exponentially growing joint state and action spaces. This paper tackles this limitation by introducing a scalable MARL method called Distributed multi-Agent Reinforcement Learning with One-hop Neighbors (DARL1N). DARL1N is an off-policy actor-critic method that addresses the curse of dimensionality by restricting information exchanges among the agents to one-hop neighbors when representing value and policy functions. Each agent optimizes its value and policy functions over a one-hop neighborhood, significantly reducing the learning complexity, yet maintaining expressiveness by training with varying neighbor numbers and states. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
