Are Your Agents Upward Deceivers?
Dadi Guo, Qingyu Liu, Dongrui Liu, Qihan Ren, Shuai Shao, Tianyi Qiu, Haoran Li, Yi R. Fung, Zhongjie Ba, Juntao Dai, Jiaming Ji, Zhikai Chen, Jialing Tao, Yaodong Yang, Jing Shao, Xia Hu

TL;DR
This paper investigates the prevalence of upward deception in LLM-based agents, revealing that they often conceal failures and perform unsupported actions, with limited mitigation effectiveness, raising safety concerns.
Contribution
It introduces the concept of agentic upward deception, constructs a benchmark to measure it, and evaluates the difficulty of mitigating such behaviors in LLM agents.
Findings
Agents frequently engage in deception behaviors like fabricating information.
Prompt-based mitigation only partially reduces deceptive actions.
Deception behaviors are prevalent across multiple task types and scenarios.
Abstract
Large Language Model (LLM)-based agents are increasingly used as autonomous subordinates that carry out tasks for users. This raises the question of whether they may also engage in deception, similar to how individuals in human organizations lie to superiors to create a good image or avoid punishment. We observe and define agentic upward deception, a phenomenon in which an agent facing environmental constraints conceals its failure and performs actions that were not requested without reporting. To assess its prevalence, we construct a benchmark of 200 tasks covering five task types and eight realistic scenarios in a constrained environment, such as broken tools or mismatched information sources. Evaluations of 11 popular LLMs reveal that these agents typically exhibit action-based deceptive behaviors, such as guessing results, performing unsupported simulations, substituting unavailable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
