Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making

Kehan Zheng; Jinfeng Zhou; Hongning Wang

arXiv:2506.09390·cs.AI·June 12, 2025

Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making

Kehan Zheng, Jinfeng Zhou, Hongning Wang

PDF

Open Access 4 Reviews

TL;DR

This paper compares large language models and humans in strategic games, revealing that LLMs mimic human heuristics but lack flexibility and context sensitivity, highlighting the need for improved training for adaptive strategic reasoning.

Contribution

It provides an empirical comparison of LLMs and humans in strategic decision-making, revealing the limitations of LLMs in capturing human-like bounded rationality.

Findings

01

LLMs reproduce human heuristics like outcome-based switching and cooperation.

02

LLMs are more rigid and less sensitive to environmental changes.

03

Architectural signatures influence strategic behavior in LLMs.

Abstract

Large language models are increasingly used in strategic decision-making settings, yet evidence shows that, like humans, they often deviate from full rationality. In this study, we compare LLMs and humans using experimental paradigms directly adapted from behavioral game-theory research. We focus on two well-studied strategic games, Rock-Paper-Scissors and the Prisoner's Dilemma, which are well known for revealing systematic departures from rational play in human subjects. By placing LLMs in identical experimental conditions, we evaluate whether their behaviors exhibit the bounded rationality characteristic of humans. Our findings show that LLMs reproduce familiar human heuristics, such as outcome-based strategy switching and increased cooperation when future interaction is possible, but they apply these rules more rigidly and demonstrate weaker sensitivity to the dynamic changes in the…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 5

Strengths

The paper addresses an important and timely research question at the intersection of behavioral game theory and large language model behavior, making it relevant to AI decision-making capabilities. The empirical focus on repeated games and heuristic strategies is well-motivated, given the extensive literature on human departures from equilibrium play, and the research has clear potential to contribute meaningfully to our understanding of how LLMs behave in strategic settings. The comparative app

Weaknesses

The paper suffers from several significant methodological and presentation issues that undermine its conclusions. Most critically, the analysis lacks proper statistical rigor, with claims about "strategic signatures" and systematic patterns supported only by qualitative evidence rather than formal statistical testing with appropriate null models. The operationalization and measurement of key concepts like "win-lose-stay" patterns are not clearly defined or transparently reported, making it diffi

Reviewer 02Rating 6Confidence 3

Strengths

1. **Clear and Logical Writing:** The paper is well-written, with a coherent and easy-to-follow structure. 2. **Principled Experiment Design:** Replicating established human behavioral studies provides a robust basis for comparing LLM and human behavior. 3. **Insightful Findings:** The observation that LLMs amplify human heuristics with greater rigidity—and show weaker environmental sensitivity—is a meaningful and nuanced contribution beyond simple claims of rationality.

Weaknesses

1. The paper notes “distinct strategic signatures” across models but does not explore the underlying causes of these differences.

Reviewer 03Rating 0Confidence 4

Strengths

1. Evaluation across two complementary games (RPS and PD), covering both zero-sum and cooperative scenarios 2. Replicates human-subject protocols and payoff structures from behavioral economics literature

Weaknesses

1. **Poor Organization.** The presentation is disorganized because the authors put the methodology, setup, strategy description, results for both games, and overall analysis within the "Experiment" section. As a result, it is unclear where the experiment design ends and the analysis begins, forcing the reader to jump back and forth to follow the logic. The authors analyze RPS and PD together, even though they have different theoretical foundations, which should be discussed separately before syn

Reviewer 04Rating 2Confidence 3

Strengths

- Directly targets bounded rationality in simple, well-studied games with closed-form Nash benchmarks. - Multiple model families, including reasoning-tuned variants were tested and compared. The per-model “strategic signatures” are interesting.

Weaknesses

- Prompt-only simulations. Despite claims of aligning with human protocols, this remains a prompt-only setup. There is no real incentive, no real-time human interaction, and no direct head-to-head human play. - Insufficient linkage to prior work. Many closely related studies are cited, but there is no quantitative comparison with these works. - Rigidity and weaker environmental sensitivity are likely due to the internal biases of the model, but this is not further investigated. - WSLU/WDLS analy

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Experimental Behavioral Economics Studies · Explainable Artificial Intelligence (XAI)

MethodsFocus