Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games

Christopher Kao; Vanshika Vats; James Davis

arXiv:2601.13709·cs.AI·January 21, 2026

Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games

Christopher Kao, Vanshika Vats, James Davis

PDF

Open Access

TL;DR

This study evaluates the deception capabilities of large language models in social deduction games, revealing that LLMs can deceive effectively but still lag behind humans, highlighting both their sophistication and potential risks.

Contribution

Introduces an asynchronous multi-agent framework for social deduction games and a Mafia Detector to assess LLM deception, providing new insights into LLM social deception in natural language contexts.

Findings

01

LLMs deceive more effectively than random baseline.

02

LLMs are less detectable than humans in social deduction games.

03

Dataset of LLM Mafia transcripts released for future research.

Abstract

Large Language Model (LLM) agents are increasingly used in many applications, raising concerns about their safety. While previous work has shown that LLMs can deceive in controlled tasks, less is known about their ability to deceive using natural language in social contexts. In this paper, we study deception in the Social Deduction Game (SDG) Mafia, where success is dependent on deceiving others through conversation. Unlike previous SDG studies, we use an asynchronous multi-agent framework which better simulates realistic social contexts. We simulate 35 Mafia games with GPT-4o LLM agents. We then create a Mafia Detector using GPT-4-Turbo to analyze game transcripts without player role information to predict the mafia players. We use prediction accuracy as a surrogate marker for deception quality. We compare this prediction accuracy to that of 28 human games and a random baseline.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education