LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
Jingru Jia, Zehua Yuan, Junhao Pan, Paul E. McNamara, Deming Chen

TL;DR
This paper evaluates large language models' strategic reasoning using behavioral game theory, revealing that model size alone doesn't determine performance and highlighting biases and the nuanced effects of prompting methods.
Contribution
Introduces a behavioral game theory framework for assessing LLM strategic reasoning, analyzing the effects of prompting and demographic biases on decision-making.
Findings
Certain models outperform others regardless of size
Chain-of-Thought prompting has limited universal benefits
Demographic features influence model decision patterns
Abstract
Strategic decision-making involves interactive reasoning where agents adapt their choices in response to others, yet existing evaluations of large language models (LLMs) often emphasize Nash Equilibrium (NE) approximation, overlooking the mechanisms driving their strategic choices. To bridge this gap, we introduce an evaluation framework grounded in behavioral game theory, disentangling reasoning capability from contextual effects. Testing 22 state-of-the-art LLMs, we find that GPT-o3-mini, GPT-o1, and DeepSeek-R1 dominate most games yet also demonstrate that the model scale alone does not determine performance. In terms of prompting enhancement, Chain-of-Thought (CoT) prompting is not universally effective, as it increases strategic reasoning only for models at certain levels while providing limited gains elsewhere. Additionally, we investigate the impact of encoded demographic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLanguage and cultural evolution · Language, Metaphor, and Cognition · AI in Service Interactions
