RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning

Wei Qiu; Xiao Ma; Bo An; Svetlana Obraztsova; Shuicheng Yan; Zhongwen; Xu

arXiv:2210.09646·cs.MA·October 19, 2022·1 cites

RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning

Wei Qiu, Xiao Ma, Bo An, Svetlana Obraztsova, Shuicheng Yan, Zhongwen, Xu

PDF

Open Access

TL;DR

This paper introduces RPM, a method that enhances the generalizability of multi-agent reinforcement learning policies by maintaining a ranked memory of past policies to promote diverse interactions during training.

Contribution

The paper proposes RPM, a novel self-play framework that improves MARL generalization by leveraging a ranked policy memory to diversify training interactions.

Findings

01

RPM significantly improves generalization to unseen agents.

02

Performance boosts up to 402% on average in experiments.

03

Diverse multi-agent trajectories enhance policy robustness.

Abstract

Despite the recent advancement in multi-agent reinforcement learning (MARL), the MARL agents easily overfit the training environment and perform poorly in the evaluation scenarios where other agents behave differently. Obtaining generalizable policies for MARL agents is thus necessary but challenging mainly due to complex multi-agent interactions. In this work, we model the problem with Markov Games and propose a simple yet effective method, ranked policy memory (RPM), to collect diverse multi-agent trajectories for training MARL policies with good generalizability. The main idea of RPM is to maintain a look-up memory of policies. In particular, we try to acquire various levels of behaviors by saving policies via ranking the training episode return, i.e., the episode return of agents in the training environment; when an episode starts, the learning agent can then choose a policy from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Data Stream Mining Techniques