TL;DR
This paper investigates how different large language models (LLMs) can develop cooperative social norms through iterative interactions, revealing model-dependent differences and potential for fostering societal cooperation.
Contribution
It introduces an evaluation framework for LLM agent cooperation, demonstrating how different models evolve social norms and the impact of mechanisms like costly punishment.
Findings
Claude 3.5 Sonnet outperforms Gemini 1.5 Flash and GPT-4o in cooperative scores.
Claude 3.5 Sonnet benefits from costly punishment mechanisms.
Behavior varies significantly across models and initial conditions.
Abstract
Large language models (LLMs) provide a compelling foundation for building generally-capable AI agents. These agents may soon be deployed at scale in the real world, representing the interests of individual humans (e.g., AI assistants) or groups of humans (e.g., AI-accelerated corporations). At present, relatively little is known about the dynamics of multiple LLM agents interacting over many generations of iterative deployment. In this paper, we examine whether a "society" of LLM agents can learn mutually beneficial social norms in the face of incentives to defect, a distinctive feature of human sociality that is arguably crucial to the success of civilization. In particular, we study the evolution of indirect reciprocity across generations of LLM agents playing a classic iterated Donor Game in which agents can observe the recent behavior of their peers. We find that the evolution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBalanced Selection
