Evolutionary Reinforcement Learning for Sample-Efficient Multiagent   Coordination

Shauharda Khadka; Somdeb Majumdar; Santiago Miret; Stephen; McAleer; Kagan Tumer

arXiv:1906.07315·cs.LG·October 13, 2020·26 cites

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination

Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Stephen, McAleer, Kagan Tumer

PDF

Open Access 1 Video

TL;DR

This paper introduces MERL, a split-level training framework combining evolutionary algorithms and gradient-based optimization to improve sample efficiency and coordination in multiagent reinforcement learning environments.

Contribution

The paper proposes MERL, a novel multiagent reinforcement learning approach that separates and combines evolutionary and gradient-based methods for better coordination.

Findings

01

MERL outperforms MADDPG on coordination benchmarks.

02

The split-level approach improves sample efficiency.

03

Information transfer between optimization processes enhances global objectives.

Abstract

Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Furthermore, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Multi-Objective Optimization Algorithms

MethodsModel-Agnostic Meta-Learning · Meta Reward Learning · Weight Decay · Convolution · Adam · Experience Replay · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · MADDPG