Combinational Q-Learning for Dou Di Zhu
Yang You, Liangwei Li, Baisong Guo, Weiming Wang, Cewu Lu

TL;DR
This paper introduces combinational Q-learning (CQL), a novel approach for handling the large action space in Dou Di Zhu, enabling agents to learn effective strategies comparable to humans.
Contribution
The paper proposes a two-stage network with order-invariant pooling to efficiently manage combinatorial actions in complex card games, outperforming existing methods.
Findings
CQL outperforms naive Q-learning and A3C in Dou Di Zhu.
Agents trained with CQL achieve human-level performance.
The method effectively reduces the action space complexity.
Abstract
Deep reinforcement learning (DRL) has gained a lot of attention in recent years, and has been proven to be able to play Atari games and Go at or above human levels. However, those games are assumed to have a small fixed number of actions and could be trained with a simple CNN network. In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions. Results show that our method prevails over state-of-the art methods like naive Q-learning and A3C. We develop an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Sports Analytics and Performance
MethodsEntropy Regularization · Convolution · Dense Connections · Softmax · A3C · Q-Learning
