Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration
Lulu Zheng, Jiarui Chen, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng, Chen, Changjie Fan, Yang Gao, Chongjie Zhang

TL;DR
This paper introduces EMC, a novel multi-agent reinforcement learning method that uses curiosity-driven exploration based on prediction errors of individual Q-values, enhancing coordinated exploration and outperforming existing methods in complex tasks.
Contribution
The paper proposes a new intrinsic reward mechanism using individual Q-value prediction errors and episodic memory to improve exploration in multi-agent reinforcement learning.
Findings
EMC outperforms state-of-the-art MARL baselines in StarCraft II benchmarks.
Intrinsic rewards based on Q-value prediction errors promote coordinated exploration.
Episodic memory boosts policy training efficiency and effectiveness.
Abstract
Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the "induced" individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local action-observation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Open Source Software Innovations
