MAC-PO: Multi-Agent Experience Replay via Collective Priority   Optimization

Yongsheng Mei; Hanhan Zhou; Tian Lan; Guru Venkataramani; Peng Wei

arXiv:2302.10418·cs.LG·March 1, 2023·6 cites

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Yongsheng Mei, Hanhan Zhou, Tian Lan, Guru Venkataramani, Peng Wei

PDF

Open Access 1 Repo

TL;DR

This paper introduces MAC-PO, a novel prioritized experience replay method for multi-agent reinforcement learning that optimizes sampling weights through regret minimization, leading to improved training efficiency and performance.

Contribution

We propose MAC-PO, which formulates and solves the optimal prioritized experience replay problem for multi-agent RL using a regret minimization framework and closed-form solutions.

Findings

01

MAC-PO outperforms state-of-the-art baselines in Predator-Prey.

02

It effectively replays important transitions, improving learning stability.

03

Experimental results show enhanced performance in StarCraft Multi-Agent Challenge.

Abstract

Experience replay is crucial for off-policy reinforcement learning (RL) methods. By remembering and reusing the experiences from past different policies, experience replay significantly improves the training efficiency and stability of RL algorithms. Many decision-making problems in practice naturally involve multiple agents and require multi-agent reinforcement learning (MARL) under centralized training decentralized execution paradigm. Nevertheless, existing MARL algorithms often adopt standard experience replay where the transitions are uniformly sampled regardless of their importance. Finding prioritized sampling weights that are optimized for MARL experience replay has yet to be explored. To this end, we propose MAC-PO, which formulates optimal prioritized experience replay for multi-agent problems as a regret minimization over the sampling weights of transitions. Such optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ysmei97/mac-po
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Bandit Algorithms Research

MethodsPrioritized Experience Replay · Experience Replay