Large language models replicate and predict human cooperation across experiments in game theory
Andrea Cera Palatsi, Samuel Martin-Gutierrez, Ana S. Cardenal, Max Pellert

TL;DR
This paper evaluates how well large language models can replicate human decision-making in game theory experiments, revealing that some models closely mimic human cooperation while others align with theoretical predictions, thus supporting their use in social simulations.
Contribution
It introduces a systematic framework for evaluating LLMs against human behavior in game theory, demonstrating their potential to replicate and predict social decision-making patterns.
Findings
Llama reproduces human cooperation patterns with high fidelity.
Qwen aligns closely with Nash equilibrium predictions.
Population-level behavioral replication achieved without persona prompting.
Abstract
Large language models (LLMs) are increasingly used both to make decisions in domains such as health, education and law, and to simulate human behavior. Yet how closely LLMs mirror actual human decision-making remains poorly understood. This gap is critical: misalignment could produce harmful outcomes in practical applications, while failure to replicate human behavior renders LLMs ineffective for social simulations. Here, we address this gap by developing a digital twin of game-theoretic experiments and introducing a systematic prompting and probing framework for machine-behavioral evaluation. Testing three open-source models (Llama, Mistral and Qwen), we find that Llama reproduces human cooperation patterns with high fidelity, capturing human deviations from rational choice theory, while Qwen aligns closely with Nash equilibrium predictions. Notably, we achieved population-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Mobile Crowdsensing and Crowdsourcing · Artificial Intelligence in Healthcare and Education
