r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection
Kai Nakamura, Sharon Levy, William Yang Wang

TL;DR
This paper introduces Fakeddit, a comprehensive multimodal dataset with over one million samples, enabling advanced fine-grained fake news detection using text and image data.
Contribution
It provides the first large-scale, multimodal, fine-grained fake news dataset with detailed labels, facilitating improved detection models.
Findings
Multimodal models outperform text-only approaches.
Fine-grained classification improves detection accuracy.
Dataset enables diverse fake news research.
Abstract
Fake news has altered society in negative ways in politics and culture. It has adversely affected both online social network systems as well as offline communities and conversations. Using automatic machine learning classification models is an efficient way to combat the widespread dissemination of fake news. However, a lack of effective, comprehensive datasets has been a problem for fake news research and detection model development. Prior fake news datasets do not provide multimodal text and image data, metadata, comment data, and fine-grained fake news categorization at the scale and breadth of our dataset. We present Fakeddit, a novel multimodal dataset consisting of over 1 million samples from multiple categories of fake news. After being processed through several stages of review, the samples are labeled according to 2-way, 3-way, and 6-way classification categories through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Advanced Malware Detection Techniques
