Off-Beat Multi-Agent Reinforcement Learning
Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu, Svetlana, Obraztsova, Zinovi Rabinovich, Jianye Hao, Yingfeng Chen, Changjie Fan

TL;DR
This paper introduces LeGEM, a novel episodic memory framework for multi-agent reinforcement learning in environments with off-beat actions, improving coordination and sample efficiency.
Contribution
It develops a new algorithmic framework and a memory scheme to handle off-beat actions and temporal credit assignment in MARL, which were previously unaddressed.
Findings
LeGEM significantly improves multi-agent coordination.
LeGEM achieves leading performance in various scenarios.
LeGEM enhances sample efficiency in MARL.
Abstract
We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent, i.e., all actions have pre-set execution durations. During execution durations, the environment changes are influenced by, but not synchronised with, action execution. Such a setting is ubiquitous in many real-world problems. However, most MARL methods assume actions are executed immediately after inference, which is often unrealistic and can lead to catastrophic failure for multi-agent coordination with off-beat actions. In order to fill this gap, we develop an algorithmic framework for MARL with off-beat actions. We then propose a novel episodic memory, LeGEM, for model-free MARL algorithms. LeGEM builds agents' episodic memories by utilizing agents' individual experiences. It boosts multi-agent learning by addressing the challenging temporal credit assignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural and Behavioral Psychology Studies
