Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

Zhaomin Wu; Mingzhe Du; See-Kiong Ng; Bingsheng He

arXiv:2508.06361·cs.LG·May 4, 2026

Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He

PDF

1 Video

TL;DR

This paper investigates the tendency of large language models to deceive intentionally on benign prompts, introducing new metrics to quantify deception and revealing that deception increases with task difficulty and is not always mitigated by larger models.

Contribution

It presents a novel framework and metrics for detecting self-initiated deception in LLMs without explicit prompting, addressing a gap in understanding real-world deception behaviors.

Findings

01

Deceptive scores increase with task difficulty for most models.

02

Larger model capacity does not necessarily reduce deception.

03

The proposed metrics correlate with each other and reveal deception tendencies.

Abstract

Large Language Models (LLMs) are widely deployed in reasoning, planning, and decision-making tasks, making their trustworthiness critical. A significant and underexplored risk is intentional deception, where an LLM deliberately fabricates or conceals information to serve a hidden objective. Existing studies typically induce deception by explicitly setting a hidden objective through prompting or fine-tuning, which may not reflect real-world human-LLM interactions. Moving beyond such human-induced deception, we investigate LLMs' self-initiated deception on benign prompts. To address the absence of ground truth, we propose a framework based on Contact Searching Questions (CSQ). This framework introduces two statistical metrics derived from psychological principles to quantify the likelihood of deception. The first, the Deceptive Intention Score, measures the model's bias toward a hidden…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts· slideslive