AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play

Vlad Murgoci; Matthijs Spaan; Yaniv Oren

arXiv:2605.09150·cs.LG·May 12, 2026

AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play

Vlad Murgoci, Matthijs Spaan, Yaniv Oren

PDF

TL;DR

AlphaExploitem is a novel reinforcement learning agent for poker that learns to exploit suboptimal opponents by reasoning over past hands and training against diverse exploitative strategies.

Contribution

It extends AlphaHoldem with a hierarchical transformer and a new training regimen to effectively exploit weak opponents in imperfect-information games.

Findings

01

Successfully exploits weak play by various opponents

02

Maintains performance against Nash equilibrium opponents

03

Operates effectively on standard poker benchmarks

Abstract

Poker is an imperfect information game that has served as a long-standing benchmark for decision-making under uncertainty. To maximize utility beyond the Nash equilibrium, an agent can deviate from Nash-equilibrium policies to exploit suboptimal play. We introduce AlphaExploitem, which extends the competitive RL poker agent AlphaHoldem by using a hierarchical transformer encoder that enables reasoning over previously played hands and modifying the training procedure with the inclusion of a diverse pool of exploitable opponents to facilitate learning to exploit. We train and evaluate AlphaExploitem on two standard benchmarks for imperfect-information games. Empirically, AlphaExploitem successfully exploits weak play by both in- and out-of-distribution opponents, without losing performance against NE opponents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.