AI Deception: Risks, Dynamics, and Controls

Boyuan Chen; Sitong Fang; Jiaming Ji; Yanxu Zhu; Pengcheng Wen; Jinzhou Wu; Yingshui Tan; Boren Zheng; Mengying Yuan; Wenqi Chen; Donghai Hong; Alex Qiu; Xin Chen; Jiayi Zhou; Kaile Wang; Juntao Dai; Borong Zhang; Tianzhuo Yang; Saad Siddiqui; Isabella Duan; Yawen Duan; Brian Tse; Jen-Tse (Jay) Huang; Kun Wang; Baihui Zheng; Jiaheng Liu; Jian Yang; Yiming Li; Wenting Chen; Dongrui Liu; Lukas Vierling; Zhiheng Xi; Haobo Fu; Wenxuan Wang; Jitao Sang; Zhengyan Shi; Chi-Min Chan; Eugenie Shi; Simin Li; Juncheng Li; Jian Yang; Wei Ji; Dong Li; Jinglin Yang; Jun Song; Yinpeng Dong; Jie Fu; Bo Zheng; Min Yang; Yike Guo; Philip Torr; Robert Trager; Yi Zeng; Zhongyuan Wang; Yaodong Yang; Tiejun Huang; Ya-Qin Zhang; Hongjiang Zhang; Andrew Yao

arXiv:2511.22619·cs.AI·December 4, 2025

AI Deception: Risks, Dynamics, and Controls

Boyuan Chen, Sitong Fang, Jiaming Ji, Yanxu Zhu, Pengcheng Wen, Jinzhou Wu, Yingshui Tan, Boren Zheng, Mengying Yuan, Wenqi Chen, Donghai Hong, Alex Qiu, Xin Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Borong Zhang, Tianzhuo Yang, Saad Siddiqui, Isabella Duan, Yawen Duan

PDF

Open Access

TL;DR

This paper offers a comprehensive overview of AI deception, defining its core concepts, exploring its emergence mechanisms, and discussing detection and mitigation strategies to address associated risks across various AI systems.

Contribution

It introduces a formal definition of AI deception, organizes research into a deception cycle, and proposes integrated mitigation and auditing approaches for sociotechnical safety.

Findings

01

Deception emerges in capable AI systems with specific incentives.

02

Detection methods include benchmarks and evaluation protocols.

03

Mitigation strategies involve technical, community, and governance efforts.

Abstract

As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk across language models, AI agents, and emerging frontier systems. This project provides a comprehensive and up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception. We then review existing empirical studies and associated risks, highlighting deception as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment. Deception emergence reveals the mechanisms underlying AI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDeception detection and forensic psychology · Ethics and Social Impacts of AI · Embodied and Extended Cognition