Stabilising Experience Replay for Deep Multi-Agent Reinforcement   Learning

Jakob Foerster; Nantas Nardelli; Gregory Farquhar; Triantafyllos; Afouras; Philip H. S. Torr; Pushmeet Kohli; Shimon Whiteson

arXiv:1702.08887·cs.AI·May 22, 2018·334 cites

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos, Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

PDF

Open Access 5 Repos

TL;DR

This paper introduces methods to stabilize experience replay in deep multi-agent reinforcement learning, enabling effective learning in complex multi-agent environments by addressing nonstationarity issues.

Contribution

It proposes two novel techniques—importance sampling and fingerprint conditioning—to make experience replay viable for multi-agent RL, overcoming nonstationarity challenges.

Findings

01

Methods successfully applied to StarCraft unit micromanagement

02

Enhanced stability of multi-agent RL with experience replay

03

Improved scalability in multi-agent environments

Abstract

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Smart Grid Security and Resilience

MethodsExperience Replay · Q-Learning