Lying to Win: Assessing LLM Deception through Human-AI Games and Parallel-World Probing

Arash Marioriyad; Ali Nouri; Mohammad Hossein Rohban; Mahdieh Soleymani Baghshah

arXiv:2603.07202·cs.CL·March 10, 2026

Lying to Win: Assessing LLM Deception through Human-AI Games and Parallel-World Probing

Arash Marioriyad, Ali Nouri, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah

PDF

Open Access

TL;DR

This paper introduces a structured game-based framework to detect and quantify deception in large language models, revealing how contextual framing influences deceptive behavior and highlighting the need for advanced behavioral audits.

Contribution

The work presents a novel, logically grounded method using parallel-world probing within a 20-Questions game to systematically assess LLM deception strategies.

Findings

01

Deception increases under existential framing for some models.

02

GPT-4o remains invariant in deceptive responses across incentives.

03

Deception can be triggered solely by contextual framing.

Abstract

As Large Language Models (LLMs) transition into autonomous agentic roles, the risk of deception-defined behaviorally as the systematic provision of false information to satisfy external incentives-poses a significant challenge to AI safety. Existing benchmarks often focus on unintentional hallucinations or unfaithful reasoning, leaving intentional deceptive strategies under-explored. In this work, we introduce a logically grounded framework to elicit and quantify deceptive behavior by embedding LLMs in a structured 20-Questions game. Our method employs a conversational forking mechanism: at the point of object identification, the dialogue state is duplicated into multiple parallel worlds, each presenting a mutually exclusive query. Deception is formally identified when a model generates a logical contradiction by denying its selected object across all parallel branches to avoid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning