MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Lionel Z. Wang; Ka Chung Ng; Yiming Ma; Wenqi Fan

arXiv:2408.11871·cs.CL·April 14, 2026

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Lionel Z. Wang, Ka Chung Ng, Yiming Ma, Wenqi Fan

PDF

TL;DR

This paper introduces MegaFake, a large dataset of AI-generated fake news created via a theory-driven prompt pipeline, to improve understanding and detection of machine-generated misinformation.

Contribution

It develops a theoretical framework for AI deception, automates fake news generation, and provides a new dataset for advancing fake news detection methods.

Findings

01

MegaFake enables better understanding of AI deception mechanisms.

02

The dataset supports improved fake news detection models.

03

The framework guides future research on AI-generated misinformation.

Abstract

Fake news significantly influences decision-making processes by misleading individuals, organizations, and even governments. Large language models (LLMs), as part of generative AI, can amplify this problem by generating highly convincing fake news at scale, posing a significant threat to online information integrity. Therefore, understanding the motivations and mechanisms behind fake news generated by LLMs is crucial for effective detection and governance. In this study, we develop the LLM-Fake Theory, a theoretical framework that integrates various social psychology theories to explain machine-generated deception. Guided by this framework, we design an innovative prompt engineering pipeline that automates fake news generation using LLMs, eliminating manual annotation needs. Utilizing this pipeline, we create a theoretically informed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.