Episodic Multi-agent Reinforcement Learning with Curiosity-Driven   Exploration

Lulu Zheng; Jiarui Chen; Jianhao Wang; Jiamin He; Yujing Hu; Yingfeng; Chen; Changjie Fan; Yang Gao; Chongjie Zhang

arXiv:2111.11032·cs.LG·November 23, 2021·40 cites

Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration

Lulu Zheng, Jiarui Chen, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng, Chen, Changjie Fan, Yang Gao, Chongjie Zhang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces EMC, a novel multi-agent reinforcement learning method that uses curiosity-driven exploration based on prediction errors of individual Q-values, enhancing coordinated exploration and outperforming existing methods in complex tasks.

Contribution

The paper proposes a new intrinsic reward mechanism using individual Q-value prediction errors and episodic memory to improve exploration in multi-agent reinforcement learning.

Findings

01

EMC outperforms state-of-the-art MARL baselines in StarCraft II benchmarks.

02

Intrinsic rewards based on Q-value prediction errors promote coordinated exploration.

03

Episodic memory boosts policy training efficiency and effectiveness.

Abstract

Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the "induced" individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local action-observation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Open Source Software Innovations