Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction
Suma Bailis, Jane Friedhoff, and Feiyang Chen

TL;DR
Werewolf Arena is a new framework that evaluates large language models by having them play a social deduction game, revealing their strategic reasoning and communication skills in a competitive setting.
Contribution
The paper introduces a novel social deduction game-based framework for benchmarking LLMs, incorporating dynamic turn-taking and competitive gameplay.
Findings
Gemini and GPT models show different strategic reasoning strengths.
The framework effectively differentiates LLM capabilities in deception and persuasion.
Werewolf Arena serves as a scalable, challenging benchmark for LLM evaluation.
Abstract
This paper introduces Werewolf Arena, a novel framework for evaluating large language models (LLMs) through the lens of the classic social deduction game, Werewolf. In Werewolf Arena, LLMs compete against each other, navigating the game's complex dynamics of deception, deduction, and persuasion. The framework introduces a dynamic turn-taking system based on bidding, mirroring real-world discussions where individuals strategically choose when to speak. We demonstrate the framework's utility through an arena-style tournament featuring Gemini and GPT models. Our results reveal distinct strengths and weaknesses in the models' strategic reasoning and communication. These findings highlight Werewolf Arena's potential as a challenging and scalable LLM benchmark.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Weight Decay · Softmax · Discriminative Fine-Tuning · Attention Dropout
