Negative Update Intervals in Deep Multi-Agent Reinforcement Learning
Gregory Palmer, Rahul Savani, Karl Tuyls

TL;DR
This paper introduces NUI-DDQN, a new deep multi-agent reinforcement learning algorithm that effectively overcomes multiple learning pathologies in complex environments by discarding episodes with outlier rewards.
Contribution
The paper presents NUI-DDQN, a novel method that improves multi-agent learning by filtering out episodes with misleading reward signals, outperforming existing approaches in complex scenarios.
Findings
NUI-DDQN reliably learns near-optimal policies in complex environments.
Existing lenient and hysteretic methods often fail in complex multi-agent settings.
NUI-DDQN overcomes multiple learning pathologies simultaneously.
Abstract
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game, within which agents must overcome multiple pathologies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Game Theory and Applications
MethodsQ-Learning
