Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

Gregory Palmer; Rahul Savani; Karl Tuyls

arXiv:1809.05096·cs.MA·May 8, 2019·5 cites

Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

Gregory Palmer, Rahul Savani, Karl Tuyls

PDF

Open Access 1 Repo

TL;DR

This paper introduces NUI-DDQN, a new deep multi-agent reinforcement learning algorithm that effectively overcomes multiple learning pathologies in complex environments by discarding episodes with outlier rewards.

Contribution

The paper presents NUI-DDQN, a novel method that improves multi-agent learning by filtering out episodes with misleading reward signals, outperforming existing approaches in complex scenarios.

Findings

01

NUI-DDQN reliably learns near-optimal policies in complex environments.

02

Existing lenient and hysteretic methods often fail in complex multi-agent settings.

03

NUI-DDQN overcomes multiple learning pathologies simultaneously.

Abstract

In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game, within which agents must overcome multiple pathologies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gjp1203/nui_in_madrl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Game Theory and Applications

MethodsQ-Learning