Human-level performance in first-person multiplayer games with   population-based deep reinforcement learning

Max Jaderberg; Wojciech M. Czarnecki; Iain Dunning; Luke Marris; Guy; Lever; Antonio Garcia Castaneda; Charles Beattie; Neil C. Rabinowitz; Ari S.; Morcos; Avraham Ruderman; Nicolas Sonnerat; Tim Green; Louise Deason; Joel Z.; Leibo; David Silver; Demis Hassabis; Koray Kavukcuoglu; Thore Graepel

arXiv:1807.01281·cs.LG·June 19, 2019

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy, Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S., Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z., Leibo, David Silver, Demis Hassabis

PDF

1 Video

TL;DR

This paper presents a population-based deep reinforcement learning approach enabling agents to achieve human-level performance in a complex multiplayer 3D first-person game, demonstrating advanced behaviors and surpassing human players.

Contribution

Introduces a novel two-tier optimization process with population-based RL agents that learn multi-timescale reasoning and achieve human-level performance in multiplayer 3D environments.

Findings

01

Agents exceeded strong human win-rates as teammates and opponents

02

Achieved human-like behaviors such as navigation and defending

03

Outperformed existing state-of-the-art agents

Abstract

Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DeepMind Has A Superhuman Level Quake 3 AI Team! 🚀· youtube