DECOR: Auditing LLM Deception via Information Manipulation Theory

Linyue Cai; Samuel Yeh; Jwala Dhamala; Rahul Gupta; Sharon Li

arXiv:2605.19270·cs.CL·May 20, 2026

DECOR: Auditing LLM Deception via Information Manipulation Theory

Linyue Cai, Samuel Yeh, Jwala Dhamala, Rahul Gupta, Sharon Li

PDF

TL;DR

DECOR is a novel multi-agent framework based on Information Manipulation Theory that enables fine-grained, interpretable detection of deception in large language models across various benchmarks.

Contribution

It introduces a theory-grounded, multi-dimensional auditing method that decomposes inputs into atomic units and scores manipulation, achieving state-of-the-art results.

Findings

01

DECOR outperforms existing baselines on deception detection benchmarks.

02

The framework generalizes across 15 different frontier models.

03

Ablation studies confirm the importance of each key component.

Abstract

Large language models can deceive by subtly manipulating truthful information -- omitting key facts, shifting focus, or obscuring meaning -- making such behavior difficult to detect. Existing black-box methods rely on coarse-grained judgments, offering limited interpretability and failing to pinpoint which facts were distorted and how. We introduce DECOR, a multi-agent framework grounded in Information Manipulation Theory for fine-grained auditing of strategic deception in LLM responses. DECOR decomposes input contexts into atomic informational units and scores each unit against the response across four dimensions of manipulation, producing interpretable manipulation profiles that are aggregated into a global deception index. We comprehensively evaluate DECOR on both single-turn and multi-turn deception detection benchmarks spanning real-world domains, and show that DECOR achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.