Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Deheng Ye; Zhao Liu; Mingfei Sun; Bei Shi; Peilin Zhao; Hao Wu,; Hongsheng Yu; Shaojie Yang; Xipeng Wu; Qingwei Guo; Qiaobo Chen; Yinyuting; Yin; Hao Zhang; Tengfei Shi; Liang Wang; Qiang Fu; Wei Yang; Lanxiao Huang

arXiv:1912.09729·cs.AI·December 16, 2020·25 cites

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Deheng Ye, Zhao Liu, Mingfei Sun, Bei Shi, Peilin Zhao, Hao Wu,, Hongsheng Yu, Shaojie Yang, Xipeng Wu, Qingwei Guo, Qiaobo Chen, Yinyuting, Yin, Hao Zhang, Tengfei Shi, Liang Wang, Qiang Fu, Wei Yang, Lanxiao Huang

PDF

Open Access

TL;DR

This paper introduces a deep reinforcement learning framework tailored for complex MOBA game control, achieving superhuman performance by handling large state-action spaces with novel algorithms and scalable system design.

Contribution

It presents a scalable RL system and novel algorithms like control dependency decoupling and dual-clip PPO for effective training in complex MOBA environments.

Findings

01

Tencent Solo defeats top professional players in 1v1 MOBA matches.

02

The proposed methods outperform existing RL approaches in complex game settings.

03

System design enables efficient large-scale exploration and training.

Abstract

We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, our AI agent, called Tencent Solo, can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Digital Games and Media

MethodsEntropy Regularization · Proximal Policy Optimization