Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games
Christopher Kao, Vanshika Vats, James Davis

TL;DR
This study evaluates the deception capabilities of large language models in social deduction games, revealing that LLMs can deceive effectively but still lag behind humans, highlighting both their sophistication and potential risks.
Contribution
Introduces an asynchronous multi-agent framework for social deduction games and a Mafia Detector to assess LLM deception, providing new insights into LLM social deception in natural language contexts.
Findings
LLMs deceive more effectively than random baseline.
LLMs are less detectable than humans in social deduction games.
Dataset of LLM Mafia transcripts released for future research.
Abstract
Large Language Model (LLM) agents are increasingly used in many applications, raising concerns about their safety. While previous work has shown that LLMs can deceive in controlled tasks, less is known about their ability to deceive using natural language in social contexts. In this paper, we study deception in the Social Deduction Game (SDG) Mafia, where success is dependent on deceiving others through conversation. Unlike previous SDG studies, we use an asynchronous multi-agent framework which better simulates realistic social contexts. We simulate 35 Mafia games with GPT-4o LLM agents. We then create a Mafia Detector using GPT-4-Turbo to analyze game transcripts without player role information to predict the mafia players. We use prediction accuracy as a surrogate marker for deception quality. We compare this prediction accuracy to that of 28 human games and a random baseline.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
