WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench
Xinyuan Xia, Yuanyi Song, Haomin Ma, Jinyu Cai

TL;DR
WereWolf-Plus is a comprehensive benchmarking platform designed to evaluate multi-agent strategic reasoning in the Werewolf game, addressing previous limitations with extensibility, diverse roles, and detailed metrics.
Contribution
It introduces a multi-model, multi-dimensional platform with customizable roles and new evaluation metrics, enhancing the assessment of social reasoning in multi-agent systems.
Findings
Supports customizable roles like Seer, Witch, Hunter, Guard, Sheriff
Provides extensive quantitative metrics for all roles
Enables evaluation of reasoning, cooperation, and social influence
Abstract
With the rapid development of LLM-based agents, increasing attention has been given to their social interaction and strategic reasoning capabilities. However, existing Werewolf-based benchmarking platforms suffer from overly simplified game settings, incomplete evaluation metrics, and poor scalability. To address these limitations, we propose WereWolf-Plus, a multi-model, multi-dimensional, and multi-method benchmarking platform for evaluating multi-agent strategic reasoning in the Werewolf game. The platform offers strong extensibility, supporting customizable configurations for roles such as Seer, Witch, Hunter, Guard, and Sheriff, along with flexible model assignment and reasoning enhancement strategies for different roles. In addition, we introduce a comprehensive set of quantitative evaluation metrics for all special roles, werewolves, and the sheriff, and enrich the assessment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Games and Media · Artificial Intelligence in Games · Video Analysis and Summarization
