A Novel Reward Shaping Function for Single-Player Mahjong
Kai Jun Chen, Lok Him Lai, Zi Iun Lai

TL;DR
This paper introduces a new reward shaping function for Single-player Mahjong that enhances agent performance by emphasizing synergistic hands, outperforming previous reward functions in simulated battles.
Contribution
The paper proposes a novel bonus reward shaping function that improves Mahjong agent performance by better valuing synergistic hands compared to existing methods.
Findings
The new reward function outperforms the ShangTing function in simulated battles.
The agent completes winning hands with an average of 35 actions over 10,000 games.
The new reward function yields an average gain of $1.37 over 1000 games.
Abstract
Mahjong is a complex game with an intractably large state space with extremely sparse rewards, which poses challenges to develop an agent to play Mahjong. To overcome this, the ShangTing function was adopted as a reward shaping function. This was combined with a forward-search algorithm to create an agent capable of completing a winning hand in Single-player Mahjong (an average of 35 actions over 10,000 games). To increase performance, we propose a novel bonus reward shaping function, which assigns higher relative values to synergistic Mahjong hands. In a simulated 1-v-1 battle, usage of the new reward function outperformed the default ShangTing function, winning an average of $1.37 over 1000 games.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Digital Games and Media · Gambling Behavior and Treatments
