Hoodwinked: Deception and Cooperation in a Text-Based Game for Language   Models

Aidan O'Gara

arXiv:2308.01404·cs.CL·August 7, 2023·6 cites

Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

Aidan O'Gara

PDF

Open Access 1 Repo

TL;DR

This paper introduces a text-based game called Hoodwinked to evaluate deception and lie detection in language models, demonstrating that advanced models like GPT-4 exhibit stronger persuasive and deceptive skills during gameplay.

Contribution

The paper presents a novel game environment for testing deception in language models and provides empirical evidence of their capabilities across different model sizes.

Findings

01

Advanced models outperform smaller ones in deception tasks

02

Models show measurable effects on voting outcomes through deception

03

Stronger persuasive skills correlate with better deception performance

Abstract

Are current language models capable of deception and lie detection? We study this question by introducing a text-based game called $Hoodwinked$ , inspired by Mafia and Among Us. Players are locked in a house and must find a key to escape, but one player is tasked with killing the others. Each time a murder is committed, the surviving players have a natural language discussion then vote to banish one player from the game. We conduct experiments with agents controlled by GPT-3, GPT-3.5, and GPT-4 and find evidence of deception and lie detection capabilities. The killer often denies their crime and accuses others, leading to measurable effects on voting outcomes. More advanced models are more effective killers, outperforming smaller models in 18 of 24 pairwise comparisons. Secondary metrics provide evidence that this improvement is not mediated by different actions, but rather by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aogara-ds/hoodwinked
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Deception detection and forensic psychology

Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Label Smoothing · Absolute Position Encodings · Byte Pair Encoding · Multi-Head Attention · Weight Decay · Attention Dropout · Position-Wise Feed-Forward Layer