AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play
Vlad Murgoci, Matthijs Spaan, Yaniv Oren

TL;DR
AlphaExploitem is a novel reinforcement learning agent for poker that learns to exploit suboptimal opponents by reasoning over past hands and training against diverse exploitative strategies.
Contribution
It extends AlphaHoldem with a hierarchical transformer and a new training regimen to effectively exploit weak opponents in imperfect-information games.
Findings
Successfully exploits weak play by various opponents
Maintains performance against Nash equilibrium opponents
Operates effectively on standard poker benchmarks
Abstract
Poker is an imperfect information game that has served as a long-standing benchmark for decision-making under uncertainty. To maximize utility beyond the Nash equilibrium, an agent can deviate from Nash-equilibrium policies to exploit suboptimal play. We introduce AlphaExploitem, which extends the competitive RL poker agent AlphaHoldem by using a hierarchical transformer encoder that enables reasoning over previously played hands and modifying the training procedure with the inclusion of a diverse pool of exploitable opponents to facilitate learning to exploit. We train and evaluate AlphaExploitem on two standard benchmarks for imperfect-information games. Empirically, AlphaExploitem successfully exploits weak play by both in- and out-of-distribution opponents, without losing performance against NE opponents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
