Intentional Deception as Controllable Capability in LLM Agents
Jason Starace, Terence Soule

TL;DR
This paper systematically studies intentional deception in LLM agents within a text-based RPG, revealing how deception varies across profiles and highlighting the limitations of fact-checking defenses against strategic framing.
Contribution
It introduces a controlled experimental framework for analyzing deception in LLM agents and uncovers key insights into deception strategies and vulnerabilities.
Findings
Deception effects are concentrated in specific behavioral profiles.
88.5% of successful deceptions use misdirection rather than fabrication.
Motivation is the primary attack vector, inferred with over 98% accuracy.
Abstract
As LLM-based agents increasingly operate in multi-agent systems, understanding adversarial manipulation becomes critical for defensive design. We present a systematic study of intentional deception as an engineered capability, using LLM-to-LLM interactions within a text-based RPG where parameterized behavioral profiles (9 alignments x 4 motivations, yielding 36 profiles with explicit ethical ground truth) serve as our experimental testbed. Unlike accidental deception from misalignment, we investigate a two-stage system that infers target agent characteristics and generates deceptive responses steering targets toward actions counter to their beliefs and motivations. We find that deceptive intervention produces differential effects concentrated in specific behavioral profiles rather than distributed uniformly, and that 88.5% of successful deceptions employ misdirection (true statements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Multi-Agent Systems and Negotiation
