AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding
Chang Lei, Huan Lei

TL;DR
This paper introduces AlphaDou, an AI that integrates bidding and cardplay for Doudizhu, achieving state-of-the-art performance through a modified reinforcement learning framework that handles the game's complexity.
Contribution
It presents a novel reinforcement learning approach that combines win rate and expectation estimation, enabling comprehensive Doudizhu gameplay including bidding and cardplay.
Findings
Achieved state-of-the-art performance in Doudizhu
Successfully integrated bidding and cardplay in AI
Demonstrated effectiveness of modified RL framework
Abstract
Artificial intelligence for card games has long been a popular topic in AI research. In recent years, complex card games like Mahjong and Texas Hold'em have been solved, with corresponding AI programs reaching the level of human experts. However, the game of Doudizhu presents significant challenges due to its vast state/action space and unique characteristics involving reasoning about competition and cooperation, making the game extremely difficult to solve.The RL model Douzero, trained using the Deep Monte Carlo algorithm framework, has shown excellent performance in Doudizhu. However, there are differences between its simplified game environment and the actual Doudizhu environment, and its performance is still a considerable distance from that of human experts. This paper modifies the Deep Monte Carlo algorithm framework by using reinforcement learning to obtain a neural network that…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The effects of bidding were incorporated, and improvements were made to Deep Montecaro.
- This can be accomplished with existing methods, and this paper represents only a partial enhancement. Additionally, it has not undergone theoretical evaluation. There is few development of the architecture from the previous study DouZero.
Stronger performance is achieved against DouZero and its variants.
- Weak experiments. More recent SOTA Doudizhu AIs should be included as the baseline methods. - The reason why the proposed method AlphaDou is better than DouZero is unclear. - The improvements (both in terms of methodology and experimental results) of the proposed method AlphaDou over DouZero seem marginal. Minor: - Section 2, The game of Doudizhu, is suggested to be moved to the appendix. - The end of Introduction. The training code for AlphaDou is available. Please attach the code link if
- The AlphaDou framework is straightforward and likely easy to implement. - The bid model is a novel artifact responsible for improving the overall playing strength of AlphaDou. - AlphaDou appears to be a more powerful agent than the previous DouZero agent in terms of winning rate and point difference.
- The contribution of AlphaDou requires more explanation. Except for the bid model, it directly combines two ideas already explored -- Deep Monte Carlo and value factorization, leaving the contribution of this work unclear. - The paper does not have a background section, leaving several notations undefined. The authors should not assume that all readers have the necessary background in RL or understand the notations without proper definitions. In addition, the equation numbers are missing. - Som
1. This paper provides a comprehensive summary of the related work on AI for DouDiZhu. 2. This paper is a good AI project implementation in the application of DouDiZhu.
1. The clarity of the paper's writing is insufficient; please refer to the Question. 2. The experimental validation is not sufficiently robust. 3. The input of the model includes the number of bombs played in the game, and I think this point involves the use of expert knowledge. An algorithm is end-to-end if only the direct observation is used as the input with post-processing.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games
MethodsSigmoid Activation · Convolution · Tanh Activation · Dense Connections · Long Short-Term Memory · Feedforward Network · Q-Learning · Deep Q-Network · DouZero
