Intentional Deception as Controllable Capability in LLM Agents

Jason Starace; Terence Soule

arXiv:2603.07848·cs.AI·March 10, 2026

Intentional Deception as Controllable Capability in LLM Agents

Jason Starace, Terence Soule

PDF

Open Access

TL;DR

This paper systematically studies intentional deception in LLM agents within a text-based RPG, revealing how deception varies across profiles and highlighting the limitations of fact-checking defenses against strategic framing.

Contribution

It introduces a controlled experimental framework for analyzing deception in LLM agents and uncovers key insights into deception strategies and vulnerabilities.

Findings

01

Deception effects are concentrated in specific behavioral profiles.

02

88.5% of successful deceptions use misdirection rather than fabrication.

03

Motivation is the primary attack vector, inferred with over 98% accuracy.

Abstract

As LLM-based agents increasingly operate in multi-agent systems, understanding adversarial manipulation becomes critical for defensive design. We present a systematic study of intentional deception as an engineered capability, using LLM-to-LLM interactions within a text-based RPG where parameterized behavioral profiles (9 alignments x 4 motivations, yielding 36 profiles with explicit ethical ground truth) serve as our experimental testbed. Unlike accidental deception from misalignment, we investigate a two-stage system that infers target agent characteristics and generates deceptive responses steering targets toward actions counter to their beliefs and motivations. We find that deceptive intervention produces differential effects concentrated in specific behavioral profiles rather than distributed uniformly, and that 88.5% of successful deceptions employ misdirection (true statements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Multi-Agent Systems and Negotiation