Combinational Q-Learning for Dou Di Zhu

Yang You; Liangwei Li; Baisong Guo; Weiming Wang; Cewu Lu

arXiv:1901.08925·cs.LG·February 20, 2019·5 cites

Combinational Q-Learning for Dou Di Zhu

Yang You, Liangwei Li, Baisong Guo, Weiming Wang, Cewu Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces combinational Q-learning (CQL), a novel approach for handling the large action space in Dou Di Zhu, enabling agents to learn effective strategies comparable to humans.

Contribution

The paper proposes a two-stage network with order-invariant pooling to efficiently manage combinatorial actions in complex card games, outperforming existing methods.

Findings

01

CQL outperforms naive Q-learning and A3C in Dou Di Zhu.

02

Agents trained with CQL achieve human-level performance.

03

The method effectively reduces the action space complexity.

Abstract

Deep reinforcement learning (DRL) has gained a lot of attention in recent years, and has been proven to be able to play Atari games and Go at or above human levels. However, those games are assumed to have a small fixed number of actions and could be trained with a simple CNN network. In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions. Results show that our method prevails over state-of-the art methods like naive Q-learning and A3C. We develop an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qq456cvb/doudizhu-C
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance

MethodsEntropy Regularization · Convolution · Dense Connections · Softmax · A3C · Q-Learning