DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents

Snehasis Mukhopadhyay

arXiv:2603.13791·cs.CL·March 17, 2026

DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents

Snehasis Mukhopadhyay

PDF

Open Access

TL;DR

This paper introduces DECEPTGUARD, a comprehensive framework for detecting deception in Large Language Model agents by comparing different monitoring regimes and leveraging synthetic data, significantly improving detection accuracy especially for subtle deception.

Contribution

The paper presents DECEPTGUARD, a unified framework that combines multiple monitoring strategies and a synthetic data pipeline to enhance deception detection in LLM agents.

Findings

01

CoT-aware and activation-probe monitors outperform black-box monitors.

02

Hybrid ensembles achieve high detection performance with pAUROC of 0.934.

03

Detection effectiveness decreases as agents suppress overt behavioral signals.

Abstract

Reliable detection of deceptive behavior in Large Language Model (LLM) agents is an essential prerequisite for safe deployment in high-stakes agentic contexts. Prior work on scheming detection has focused exclusively on black-box monitors that observe only externally visible tool calls and outputs, discarding potentially rich internal reasoning signals. We introduce DECEPTGUARD, a unified framework that systematically compares three monitoring regimes: black-box monitors (actions and outputs only), CoT-aware monitors (additionally observing the agent's chain-of-thought reasoning trace), and activation-probe monitors (additionally reading hidden-state representations from a frozen open-weights encoder). We introduce DECEPTSYNTH, a scalable synthetic pipeline for generating deception-positive and deception-negative agent trajectories across a novel 12-category taxonomy spanning verbal,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDeception detection and forensic psychology · Topic Modeling · Explainable Artificial Intelligence (XAI)