Playing repeated games with Large Language Models

Elif Akata; Lion Schulz; Julian Coda-Forno; Seong Joon Oh; Matthias Bethge; Eric Schulz

arXiv:2305.16867·cs.CL·May 13, 2025·37 cites

Playing repeated games with Large Language Models

Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, Eric Schulz

PDF

Open Access 5 Reviews

TL;DR

This paper investigates how large language models behave in repeated game scenarios, revealing their strengths in self-interest and limitations in coordination, and explores ways to improve their social interaction strategies.

Contribution

It introduces a behavioral game theory framework for analyzing LLMs in repeated games and demonstrates methods to modulate their social behaviors for better cooperation.

Findings

01

LLMs excel in self-interested games like Prisoner's Dilemma

02

LLMs perform poorly in coordination games such as Battle of the Sexes

03

Providing additional info and SCoT improves LLM coordination with humans

Abstract

LLMs are increasingly used in applications where they interact with humans and other agents. We propose to use behavioural game theory to study LLM's cooperation and coordination behaviour. We let different LLMs play finitely repeated $2 \times 2$ games with each other, with human-like strategies, and actual human players. Our results show that LLMs perform particularly well at self-interested games like the iterated Prisoner's Dilemma family. However, they behave sub-optimally in games that require coordination, like the Battle of the Sexes. We verify that these behavioural signatures are stable across robustness checks. We additionally show how GPT-4's behaviour can be modulated by providing additional information about its opponent and by using a "social chain-of-thought" (SCoT) strategy. This also leads to better scores and more successful coordination when interacting with human…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

* The paper employs a well-defined evaluation framework and experimental setup, providing a solid foundation for its investigations. * The conceptual insights gained from observing LLM behavior across various game-theoretic scenarios are intriguing and represent a contribution to our understanding of LLM interactions.

Weaknesses

* The study's exploration of prompt sensitivity could be expanded. Beyond altering the order of options, relabeling them, and changing the representation of utilities, there may be additional dimensions of prompt design that could significantly influence LLM behavior. * The assumption that LLMs can serve as strategic agents is somewhat discordant with the primary design of LLMs, which is document completion rather than strategic decision-making. This disparity may lead to LLMs not fully grasping

Reviewer 02Rating 3· reject, not good enoughConfidence 3

Strengths

The paper is written with purpose, clear, and well presented. The authors study a timely and relevant topic by investigating the behavior of Large Language Models (LLMs) in classic repeated games. As LLMs are increasingly integrated into various applications, understanding their cooperation and coordination behavior is important. In general, this paper proposed an innovative research topic.

Weaknesses

It seems a similar problem has been studied by Brookins, Philip, and Jason Matthew DeBacker. "Playing games with GPT: What can we learn about a large language model from canonical strategic games?." Available at SSRN 4493398 (2023). I want to clarify that my mention of these works is not meant as criticism. I acknowledge that they were published within six months of this paper's submission. Also, it's not apparent that this paper investigates the exact same results, there are noticeable s

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

This paper proposed to use behavioral game theory to study LLM's cooperation and coordination behavior, and showed some behavioral styles of LLMs. These results enrich the understanding of LLM's social behavior.

Weaknesses

This is not a technical paper and not an application to neuroscience \& cognitive science. Then I think it is not related to ICLR. It would be great if authors could use these findings to develop better algorithms.

Reviewer 04Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. As far as I know, this is the first paper that studies the behavior of LLMs in games. The paper systematically considers a wide range of powerful LLMs and explores all possible 2x2 game scenarios. 2. The paper includes overall numerical statistics and detailed analyses for Prisoner's Dilemma and Battle of the Sexes games. 3. The paper is well-written, with a clear structure. 4. The phenomena like "unforgiveness" and "noncooperation" are impressive. These findings help us understand the behavi

Weaknesses

1. Generally speaking, this paper may be more suitable for AAMAS, AAAI, and IJCAI, rather than ICLR, which focuses more on machine learning technology. The paper primarily focuses on the observation of LLM behavior, lacking machine learning techniques. Incorporating more robust methodologies could enhance its scientific rigor. 2. Although the article is very comprehensive in studying 2x2 games, I still question whether the experimental results in this area are representative. For example, whethe

Reviewer 05Rating 3· reject, not good enoughConfidence 4

Strengths

The problem setup are well-motivated: using game to simulate LLMs behaviors and as a tool to assess their intelligence level is definitely a very promising direction. The presentation is clear.

Weaknesses

The way that it construct a strategy from an LLM is questionable. The conclusion of the evaluation does not provide too much insights to me. Please see the questions section below.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings · {Dispute@FaQ-s}How to file a dispute with Expedia? · Cosine Annealing · Dense Connections