Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber, Daniel W\"alchli, Mustafa Zeqiri, Petros Koumoutsakos

TL;DR
This paper extends the ReF-ER algorithm to multi-agent reinforcement learning, demonstrating improved performance in complex environments by using a simple neural network architecture and tailored value estimation strategies.
Contribution
The paper introduces ReF-ER for MARL, showing its effectiveness and simplicity compared to complex neural network approaches in multi-agent settings.
Findings
ReF-ER MARL outperforms existing algorithms on SISL benchmarks.
Using individual rewards improves collaborative environment performance.
A single feed-forward neural network suffices for policy and value estimation.
Abstract
We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques
MethodsExperience Replay
