# Improving Coordination in Small-Scale Multi-Agent Deep Reinforcement   Learning through Memory-driven Communication

**Authors:** Emanuele Pesce, Giovanni Montana

arXiv: 1901.03887 · 2020-01-27

## TL;DR

This paper introduces a memory-driven communication framework for multi-agent deep reinforcement learning, enabling agents to learn explicit communication protocols that enhance coordination and performance in small-scale systems.

## Contribution

It presents a novel end-to-end training method for learning communication protocols via a memory device within multi-agent deep reinforcement learning.

## Key findings

- Improved coordination and performance in small-scale multi-agent systems.
- Emergence of diverse communication patterns across tasks.
- Robustness of communication channels demonstrated through ablation studies.

## Abstract

Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance in small-scale systems. Our experimental results show that the proposed method achieves superior performance in scenarios with up to six agents. We illustrate how different communication patterns can emerge on six different tasks of increasing complexity. Furthermore, we study the effects of corrupting the communication channel, provide a visualisation of the time-varying memory content as the underlying task is being solved and validate the building blocks of the proposed memory device through ablation studies.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.03887/full.md

## Figures

28 figures with captions in the complete paper: https://tomesphere.com/paper/1901.03887/full.md

## References

86 references — full list in the complete paper: https://tomesphere.com/paper/1901.03887/full.md

---
Source: https://tomesphere.com/paper/1901.03887