Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Stephen, McAleer, Kagan Tumer

TL;DR
This paper introduces MERL, a split-level training framework combining evolutionary algorithms and gradient-based optimization to improve sample efficiency and coordination in multiagent reinforcement learning environments.
Contribution
The paper proposes MERL, a novel multiagent reinforcement learning approach that separates and combines evolutionary and gradient-based methods for better coordination.
Findings
MERL outperforms MADDPG on coordination benchmarks.
The split-level approach improves sample efficiency.
Information transfer between optimization processes enhances global objectives.
Abstract
Many cooperative multiagent reinforcement learning environments provide agents with a sparse team-based reward, as well as a dense agent-specific reward that incentivizes learning basic skills. Training policies solely on the team-based reward is often difficult due to its sparsity. Furthermore, relying solely on the agent-specific reward is sub-optimal because it usually does not capture the team coordination objective. A common approach is to use reward shaping to construct a proxy reward by combining the individual rewards. However, this requires manual tuning for each environment. We introduce Multiagent Evolutionary Reinforcement Learning (MERL), a split-level training platform that handles the two objectives separately through two optimization processes. An evolutionary algorithm maximizes the sparse team-based objective through neuroevolution on a population of teams.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Multi-Objective Optimization Algorithms
MethodsModel-Agnostic Meta-Learning · Meta Reward Learning · Weight Decay · Convolution · Adam · Experience Replay · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · MADDPG
