Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation
Jiahao Cheng, Tiancheng Su, Jia Yuan, Guoxiu He, Jiawei Liu, Xinqi Tao, Jingwen Xie, Huaxia Li

TL;DR
This paper empirically evaluates how Chain-of-Thought prompting affects hallucination detection in large language models, revealing that while it reduces hallucinations, it also obscures signals critical for detection, highlighting a key trade-off.
Contribution
It provides a systematic empirical analysis of CoT prompting's impact on hallucination detection, uncovering that CoT can obscure detection signals despite reducing hallucinations.
Findings
CoT prompting reduces hallucination frequency.
CoT obscures signals used for hallucination detection.
Detection accuracy is impaired by CoT prompting.
Abstract
Large Language Models (LLMs) often exhibit \textit{hallucinations}, generating factually incorrect or semantically irrelevant content in response to prompts. Chain-of-Thought (CoT) prompting can mitigate hallucinations by encouraging step-by-step reasoning, but its impact on hallucination detection remains underexplored. To bridge this gap, we conduct a systematic empirical evaluation. We begin with a pilot experiment, revealing that CoT reasoning significantly affects the LLM's internal states and token probability distributions. Building on this, we evaluate the impact of various CoT prompting methods on mainstream hallucination detection methods across both instruction-tuned and reasoning-oriented LLMs. Specifically, we examine three key dimensions: changes in hallucination score distributions, variations in detection accuracy, and shifts in detection confidence. Our findings show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMental Health Research Topics · Mental Health via Writing · Machine Learning in Healthcare
