AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Chang Lei; Huan Lei

arXiv:2407.10279·cs.AI·September 16, 2024

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Chang Lei, Huan Lei

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces AlphaDou, an AI that integrates bidding and cardplay for Doudizhu, achieving state-of-the-art performance through a modified reinforcement learning framework that handles the game's complexity.

Contribution

It presents a novel reinforcement learning approach that combines win rate and expectation estimation, enabling comprehensive Doudizhu gameplay including bidding and cardplay.

Findings

01

Achieved state-of-the-art performance in Doudizhu

02

Successfully integrated bidding and cardplay in AI

03

Demonstrated effectiveness of modified RL framework

Abstract

Artificial intelligence for card games has long been a popular topic in AI research. In recent years, complex card games like Mahjong and Texas Hold'em have been solved, with corresponding AI programs reaching the level of human experts. However, the game of Doudizhu presents significant challenges due to its vast state/action space and unique characteristics involving reasoning about competition and cooperation, making the game extremely difficult to solve.The RL model Douzero, trained using the Deep Monte Carlo algorithm framework, has shown excellent performance in Doudizhu. However, there are differences between its simplified game environment and the actual Doudizhu environment, and its performance is still a considerable distance from that of human experts. This paper modifies the Deep Monte Carlo algorithm framework by using reinforcement learning to obtain a neural network that…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 3

Strengths

- The effects of bidding were incorporated, and improvements were made to Deep Montecaro.

Weaknesses

- This can be accomplished with existing methods, and this paper represents only a partial enhancement. Additionally, it has not undergone theoretical evaluation. There is few development of the architecture from the previous study DouZero.

Reviewer 02Rating 3Confidence 4

Strengths

Stronger performance is achieved against DouZero and its variants.

Weaknesses

- Weak experiments. More recent SOTA Doudizhu AIs should be included as the baseline methods. - The reason why the proposed method AlphaDou is better than DouZero is unclear. - The improvements (both in terms of methodology and experimental results) of the proposed method AlphaDou over DouZero seem marginal. Minor: - Section 2, The game of Doudizhu, is suggested to be moved to the appendix. - The end of Introduction. The training code for AlphaDou is available. Please attach the code link if

Reviewer 03Rating 3Confidence 4

Strengths

- The AlphaDou framework is straightforward and likely easy to implement. - The bid model is a novel artifact responsible for improving the overall playing strength of AlphaDou. - AlphaDou appears to be a more powerful agent than the previous DouZero agent in terms of winning rate and point difference.

Weaknesses

- The contribution of AlphaDou requires more explanation. Except for the bid model, it directly combines two ideas already explored -- Deep Monte Carlo and value factorization, leaving the contribution of this work unclear. - The paper does not have a background section, leaving several notations undefined. The authors should not assume that all readers have the necessary background in RL or understand the notations without proper definitions. In addition, the equation numbers are missing. - Som

Reviewer 04Rating 3Confidence 5

Strengths

1. This paper provides a comprehensive summary of the related work on AI for DouDiZhu. 2. This paper is a good AI project implementation in the application of DouDiZhu.

Weaknesses

1. The clarity of the paper's writing is insufficient; please refer to the Question. 2. The experimental validation is not sufficiently robust. 3. The input of the model includes the number of bombs played in the game, and I think this point involves the use of expert knowledge. An algorithm is end-to-end if only the direct observation is used as the input with post-processing.

Code & Models

Repositories

RuBP17/AlphaDou
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games

MethodsSigmoid Activation · Convolution · Tanh Activation · Dense Connections · Long Short-Term Memory · Feedforward Network · Q-Learning · Deep Q-Network · DouZero