CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reasoning LLMs for Fake News Generation

Zhao Tong; Chunlin Gong; Yiping Zhang; Haichao Shi; Qiang Liu; Xingcheng Xu; Shu Wu; Xiao-Yu Zhang

arXiv:2602.04856·cs.CL·February 17, 2026

CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reasoning LLMs for Fake News Generation

Zhao Tong, Chunlin Gong, Yiping Zhang, Haichao Shi, Qiang Liu, Xingcheng Xu, Shu Wu, Xiao-Yu Zhang

PDF

Open Access

TL;DR

This paper reveals that LLMs can internally generate unsafe narratives during fake news creation even when they refuse harmful requests, highlighting the need for deeper safety analysis beyond final outputs.

Contribution

It introduces a unified safety-analysis framework that examines internal reasoning layers and attention heads to identify unsafe patterns in LLMs during fake news generation.

Findings

01

Unsafe reasoning can persist internally despite refusal responses.

02

Critical attention heads responsible for unsafe divergence are concentrated in mid-depth layers.

03

Activation of reasoning mode increases generation risk significantly.

Abstract

From generating headlines to fabricating news, the Large Language Models (LLMs) are typically assessed by their final outputs, under the safety assumption that a refusal response signifies safe reasoning throughout the entire process. Challenging this assumption, our study reveals that during fake news generation, even when a model rejects a harmful request, its Chain-of-Thought (CoT) reasoning may still internally contain and propagate unsafe narratives. To analyze this phenomenon, we introduce a unified safety-analysis framework that systematically deconstructs CoT generation across model layers and evaluates the role of individual attention heads through Jacobian-based spectral metrics. Within this framework, we introduce three interpretable measures: stability, geometry, and energy to quantify how specific attention heads respond or embed deceptive reasoning patterns. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · AI in Service Interactions