Playing repeated games with Large Language Models
Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, Eric Schulz

TL;DR
This paper investigates how large language models behave in repeated game scenarios, revealing their strengths in self-interest and limitations in coordination, and explores ways to improve their social interaction strategies.
Contribution
It introduces a behavioral game theory framework for analyzing LLMs in repeated games and demonstrates methods to modulate their social behaviors for better cooperation.
Findings
LLMs excel in self-interested games like Prisoner's Dilemma
LLMs perform poorly in coordination games such as Battle of the Sexes
Providing additional info and SCoT improves LLM coordination with humans
Abstract
LLMs are increasingly used in applications where they interact with humans and other agents. We propose to use behavioural game theory to study LLM's cooperation and coordination behaviour. We let different LLMs play finitely repeated games with each other, with human-like strategies, and actual human players. Our results show that LLMs perform particularly well at self-interested games like the iterated Prisoner's Dilemma family. However, they behave sub-optimally in games that require coordination, like the Battle of the Sexes. We verify that these behavioural signatures are stable across robustness checks. We additionally show how GPT-4's behaviour can be modulated by providing additional information about its opponent and by using a "social chain-of-thought" (SCoT) strategy. This also leads to better scores and more successful coordination when interacting with human…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
* The paper employs a well-defined evaluation framework and experimental setup, providing a solid foundation for its investigations. * The conceptual insights gained from observing LLM behavior across various game-theoretic scenarios are intriguing and represent a contribution to our understanding of LLM interactions.
* The study's exploration of prompt sensitivity could be expanded. Beyond altering the order of options, relabeling them, and changing the representation of utilities, there may be additional dimensions of prompt design that could significantly influence LLM behavior. * The assumption that LLMs can serve as strategic agents is somewhat discordant with the primary design of LLMs, which is document completion rather than strategic decision-making. This disparity may lead to LLMs not fully grasping
The paper is written with purpose, clear, and well presented. The authors study a timely and relevant topic by investigating the behavior of Large Language Models (LLMs) in classic repeated games. As LLMs are increasingly integrated into various applications, understanding their cooperation and coordination behavior is important. In general, this paper proposed an innovative research topic.
It seems a similar problem has been studied by Brookins, Philip, and Jason Matthew DeBacker. "Playing games with GPT: What can we learn about a large language model from canonical strategic games?." Available at SSRN 4493398 (2023). I want to clarify that my mention of these works is not meant as criticism. I acknowledge that they were published within six months of this paper's submission. Also, it's not apparent that this paper investigates the exact same results, there are noticeable s
This paper proposed to use behavioral game theory to study LLM's cooperation and coordination behavior, and showed some behavioral styles of LLMs. These results enrich the understanding of LLM's social behavior.
This is not a technical paper and not an application to neuroscience \& cognitive science. Then I think it is not related to ICLR. It would be great if authors could use these findings to develop better algorithms.
1. As far as I know, this is the first paper that studies the behavior of LLMs in games. The paper systematically considers a wide range of powerful LLMs and explores all possible 2x2 game scenarios. 2. The paper includes overall numerical statistics and detailed analyses for Prisoner's Dilemma and Battle of the Sexes games. 3. The paper is well-written, with a clear structure. 4. The phenomena like "unforgiveness" and "noncooperation" are impressive. These findings help us understand the behavi
1. Generally speaking, this paper may be more suitable for AAMAS, AAAI, and IJCAI, rather than ICLR, which focuses more on machine learning technology. The paper primarily focuses on the observation of LLM behavior, lacking machine learning techniques. Incorporating more robust methodologies could enhance its scientific rigor. 2. Although the article is very comprehensive in studying 2x2 games, I still question whether the experimental results in this area are representative. For example, whethe
The problem setup are well-motivated: using game to simulate LLMs behaviors and as a tool to assess their intelligence level is definitely a very promising direction. The presentation is clear.
The way that it construct a strategy from an LLM is questionable. The conclusion of the evaluation does not provide too much insights to me. Please see the questions section below.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings · {Dispute@FaQ-s}How to file a dispute with Expedia? · Cosine Annealing · Dense Connections
