Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games
Nathan Herr, Fernando Acero, Roberta Raileanu, Mar\'ia P\'erez-Ortiz,, and Zhibin Li

TL;DR
This study evaluates large language models in strategic game scenarios, revealing biases that impair their decision-making and performance, and shows that current methods like chain-of-thought prompting have mixed effects on these biases.
Contribution
It provides a structured evaluation of LLMs in game-theoretic contexts, highlighting biases affecting their strategic reasoning and performance drops under misaligned configurations.
Findings
LLMs exhibit positional, payoff, and behavioral biases in game decisions.
Performance drops of 16-34% observed when game configurations are misaligned.
Chain-of-thought prompting reduces biases in some models but worsens them in others.
Abstract
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic decision-making abilities remain largely unexplored. To fully benefit from the potential of LLMs, it's essential to understand their ability to function in complex social scenarios. Game theory, which is already used to understand real-world interactions, provides a good framework for assessing these abilities. This work investigates the performance and merits of LLMs in canonical game-theoretic two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma. Our structured evaluation of GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B shows that these models, when making decisions in these games, are affected by at least one of the following systematic biases: positional bias, payoff bias, or behavioural bias. This indicates that LLMs do not fully rely on logical reasoning when making these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Software Engineering Research · Big Data and Business Intelligence
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Residual Connection · Dropout · Layer Normalization · Linear Warmup With Cosine Annealing · Adam · Byte Pair Encoding
