Evaluating Collective Behaviour of Hundreds of LLM Agents

Richard Willis; Jianing Zhao; Yali Du; Joel Z. Leibo

arXiv:2602.16662·cs.MA·February 19, 2026

Evaluating Collective Behaviour of Hundreds of LLM Agents

Richard Willis, Jianing Zhao, Yali Du, Joel Z. Leibo

PDF

Open Access

TL;DR

This paper presents an evaluation framework for analyzing the collective behavior of hundreds of LLM-based agents in social dilemmas, revealing risks of poor societal outcomes and convergence to suboptimal equilibria.

Contribution

It introduces a scalable evaluation method for large populations of LLM agents and demonstrates the societal risks associated with their collective behavior.

Findings

01

Recent models perform worse in societal outcomes than older models.

02

Large populations tend to converge to poor societal equilibria.

03

Diminished benefits of cooperation increase societal risks.

Abstract

As autonomous agents powered by LLM are increasingly deployed in society, understanding their collective behaviour in social dilemmas becomes critical. We introduce an evaluation framework where LLMs generate strategies encoded as algorithms, enabling inspection prior to deployment and scaling to populations of hundreds of agents -- substantially larger than in previous work. We find that more recent models tend to produce worse societal outcomes compared to older models when agents prioritise individual gain over collective benefits. Using cultural evolution to model user selection of agents, our simulations reveal a significant risk of convergence to poor societal equilibria, particularly when the relative benefit of cooperation diminishes and population sizes increase. We release our code as an evaluation suite for developers to assess the emergent collective behaviour of their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Mobile Crowdsensing and Crowdsourcing · Multi-Agent Systems and Negotiation