StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games
Andy Caen, Mark H.M. Winands, Dennis J.N.J. Soemers

TL;DR
StratFormer is a transformer-based meta-agent that learns to model and exploit opponents in imperfect-information games through a two-phase curriculum, achieving significant exploitability gains while maintaining safety.
Contribution
The paper introduces a novel transformer architecture with dual-turn tokens and a two-phase training curriculum for opponent modeling and exploitation in imperfect-information games.
Findings
Achieves +0.106 BB per hand average exploitation gain on Leduc Hold'em.
Reaches peak gains of +0.821 BB against highly exploitable opponents.
Maintains near-equilibrium safety during exploitation.
Abstract
We present StratFormer, a transformer-based meta-agent that learns to simultaneously model and exploit opponents in imperfect-information games through a two-phase curriculum. The first phase trains an opponent modeling head to identify behavioral patterns from action histories while the agent plays a game-theoretic optimal (GTO) policy. The second phase progressively shifts the policy toward best-response (BR) exploitation, guided by a per-opponent regularization schedule tied to exploitability. Our architecture introduces dual-turn tokens -- feature vectors constructed at both agent and opponent decision points -- coupled with bucket-rate features that encode opponent tendencies across five strategic contexts. On Leduc Hold'em, a small poker variant with six cards and two betting rounds, we test against six opponent archetypes at two strength levels each, with exploitability ranging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
