The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
Pedro M. P. Curvo

TL;DR
This paper introduces The Traitors, a multi-agent simulation framework using large language models to study deception, trust, and social dynamics, aiming to improve understanding of AI behavior in socially complex interactions.
Contribution
It presents a formal, configurable simulation environment with evaluation metrics and experiments demonstrating deception and trust dynamics among LLM agents.
Findings
Advanced models like GPT-4o show stronger deception skills.
Deception abilities scale faster than detection capabilities.
GPT-4o is more vulnerable to falsehoods from others.
Abstract
As AI systems increasingly assume roles where trust and alignment with human values are essential, understanding when and why they engage in deception has become a critical research priority. We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games, designed to probe deception, trust formation, and strategic communication among large language model (LLM) agents under asymmetric information. A minority of agents the traitors seek to mislead the majority, while the faithful must infer hidden identities through dialogue and reasoning. Our contributions are: (1) we ground the environment in formal frameworks from game theory, behavioral economics, and social cognition; (2) we develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality; (3) we implement a fully autonomous simulation platform where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence · Ethics and Social Impacts of AI · Language and cultural evolution
