MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
Lionel Z. Wang, Ka Chung Ng, Yiming Ma, Wenqi Fan

TL;DR
This paper introduces MegaFake, a large dataset of AI-generated fake news created via a theory-driven prompt pipeline, to improve understanding and detection of machine-generated misinformation.
Contribution
It develops a theoretical framework for AI deception, automates fake news generation, and provides a new dataset for advancing fake news detection methods.
Findings
MegaFake enables better understanding of AI deception mechanisms.
The dataset supports improved fake news detection models.
The framework guides future research on AI-generated misinformation.
Abstract
Fake news significantly influences decision-making processes by misleading individuals, organizations, and even governments. Large language models (LLMs), as part of generative AI, can amplify this problem by generating highly convincing fake news at scale, posing a significant threat to online information integrity. Therefore, understanding the motivations and mechanisms behind fake news generated by LLMs is crucial for effective detection and governance. In this study, we develop the LLM-Fake Theory, a theoretical framework that integrates various social psychology theories to explain machine-generated deception. Guided by this framework, we design an innovative prompt engineering pipeline that automates fake news generation using LLMs, eliminating manual annotation needs. Utilizing this pipeline, we create a theoretically informed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
