How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use

Minhua Lin; Enyan Dai; Hui Liu; Xianfeng Tang; Yuliang Yan; Zhenwei Dai; Jingying Zeng; Zhiwei Zhang; Fali Wang; Hongcheng Gao; Chen Luo; Xiang Zhang; Qi He; Suhang Wang

arXiv:2602.00528·cs.AI·February 3, 2026

How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use

Minhua Lin, Enyan Dai, Hui Liu, Xianfeng Tang, Yuliang Yan, Zhenwei Dai, Jingying Zeng, Zhiwei Zhang, Fali Wang, Hongcheng Gao, Chen Luo, Xiang Zhang, Qi He, Suhang Wang

PDF

Open Access 3 Reviews

TL;DR

This paper evaluates LLMs in poker, revealing their limitations in strategic reasoning and proposing ToolPoker, a framework integrating external solvers to enhance game-theoretic play and reasoning transparency.

Contribution

Introduces ToolPoker, a tool-integrated framework combining external solvers with LLMs to improve strategic reasoning and gameplay in poker.

Findings

01

LLMs underperform against traditional algorithms in poker.

02

Three common flaws identified: reliance on heuristics, factual misunderstandings, and reasoning-action gap.

03

ToolPoker achieves state-of-the-art gameplay and better reasoning traces.

Abstract

As Large Language Models (LLMs) are increasingly applied in high-stakes domains, their ability to reason strategically under uncertainty becomes critical. Poker provides a rigorous testbed, requiring not only strong actions but also principled, game-theoretic reasoning. In this paper, we conduct a systematic study of LLMs in multiple realistic poker tasks, evaluating both gameplay outcomes and reasoning traces. Our analysis reveals LLMs fail to compete against traditional algorithms and identifies three recurring flaws: reliance on heuristics, factual misunderstandings, and a "knowing-doing" gap where actions diverge from reasoning. An initial attempt with behavior cloning and step-level reinforcement learning improves reasoning style but remains insufficient for accurate game-theoretic play. Motivated by these limitations, we propose ToolPoker, a tool-integrated reasoning framework…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

The paper is very clear in its presentation and generally high in quality: each newly introduced approach built on the last in a way that made the paper particularly easy to follow and made the motivation for the different components of the ToolPoker system apparent. The main points of significance and originality for the paper are that (1) it evaluates how LLMs reason about poker and presents qualitative and quantitative analyses of particular types of shortcomings in the reasoning process of v

Weaknesses

The primary weakness of this paper lies in the novelty of the approach: while the paper does a very good job analyzing LLM performance on poker and explaining why the ToolPoker approach was developed, it is not clear if there is anything that sets ToolPoker apart from other tool-use frameworks, other than the task setting. In particular, explicitly comparing the strengths of ToolPoker with other approaches like ReTool (mentioned in the paper) would be helpful in evaluating the approach.

Reviewer 02Rating 4Confidence 3

Strengths

- The first systematic study analyzing LLM reasoning and action alignment in poker, identifying fundamental weaknesses in heuristic dependence, factual errors, and knowing–doing gaps. - A detailed investigation of whether behavior cloning and step-level RL can internally mitigate these flaws, revealing their limited capacity to achieve GTO-consistent reasoning. - ToolPoker integrates external solvers into LLM reasoning for imperfect-information games, with a unified API and solver-augmented tr

Weaknesses

1. The composite reward (Eq. 4) combines R_answer, R_format, and R_tool with tunable weights. How each component quantitatively contributes to tool-learning behavior. Providing ablation or sensitivity analyses—e.g., varying α_f and α_t, or visualizing reward trajectories—would improve transparency and reproducibility. Moreover, discussing how the model avoids reward hacking, e.g., overusing the solver or formatting cues without deeper reasoning, would strengthen the credibility of the RL setup.

Reviewer 03Rating 4Confidence 4

Strengths

1. This paper is well-written and well-organized. The proposed method is simple and easy to follow. 2. This paper presents extensive experimental results with detailed analysis on the ablation study as well as limitations.

Weaknesses

1. The novelty of the paper is limited. It mainly applies reinforcement learning to a large language model using a classic game-theoretic solver (e.g., CFR+) as the reward signal or direct PPO, without introducing any fundamentally new algorithmic contributions or insights. Essentially, the work repackages standard solver outputs within an RL fine-tuning framework, resulting in incremental rather than conceptual advancement. Also, while the related work section is decent, it omits several works

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Topic Modeling · Explainable Artificial Intelligence (XAI)