Fine-Grained and Thematic Evaluation of LLMs in Social Deduction Game

Byungjun Kim; Dayeon Seo; Minju Kim; Bugeun Kim

arXiv:2408.09946·cs.AI·October 8, 2025

Fine-Grained and Thematic Evaluation of LLMs in Social Deduction Game

Byungjun Kim, Dayeon Seo, Minju Kim, Bugeun Kim

PDF

Open Access

TL;DR

This paper introduces a detailed evaluation framework for large language models in social deduction games, highlighting their reasoning failures and providing more nuanced insights than previous coarse metrics.

Contribution

It presents six fine-grained metrics and a thematic analysis approach to better evaluate LLMs' performance in social deduction tasks.

Findings

01

Identified four major reasoning failures in LLMs

02

Developed six detailed evaluation metrics

03

Highlighted limitations of previous coarse-grained assessments

Abstract

Recent studies have investigated whether large language models (LLMs) can support obscured communication, which is characterized by core aspects such as inferring subtext and evading suspicions. To conduct the investigation, researchers have used social deduction games (SDGs) as their experimental environment, in which players conceal and infer specific information. However, prior work has often overlooked how LLMs should be evaluated in such settings. Specifically, we point out two limitations with the evaluation methods they employed. First, metrics used in prior studies are coarse-grained as they are based on overall game outcomes that often fail to capture event-level behaviors; Second, error analyses have lacked structured methodologies capable of producing insights that meaningfully support evaluation outcomes. To address these limitations, we propose a microscopic and systematic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Games and Media