DECOR: Auditing LLM Deception via Information Manipulation Theory
Linyue Cai, Samuel Yeh, Jwala Dhamala, Rahul Gupta, Sharon Li

TL;DR
DECOR is a novel multi-agent framework based on Information Manipulation Theory that enables fine-grained, interpretable detection of deception in large language models across various benchmarks.
Contribution
It introduces a theory-grounded, multi-dimensional auditing method that decomposes inputs into atomic units and scores manipulation, achieving state-of-the-art results.
Findings
DECOR outperforms existing baselines on deception detection benchmarks.
The framework generalizes across 15 different frontier models.
Ablation studies confirm the importance of each key component.
Abstract
Large language models can deceive by subtly manipulating truthful information -- omitting key facts, shifting focus, or obscuring meaning -- making such behavior difficult to detect. Existing black-box methods rely on coarse-grained judgments, offering limited interpretability and failing to pinpoint which facts were distorted and how. We introduce DECOR, a multi-agent framework grounded in Information Manipulation Theory for fine-grained auditing of strategic deception in LLM responses. DECOR decomposes input contexts into atomic informational units and scores each unit against the response across four dimensions of manipulation, producing interpretable manipulation profiles that are aggregated into a global deception index. We comprehensively evaluate DECOR on both single-turn and multi-turn deception detection benchmarks spanning real-world domains, and show that DECOR achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
