r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake   News Detection

Kai Nakamura; Sharon Levy; William Yang Wang

arXiv:1911.03854·cs.CL·March 13, 2020·58 cites

r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection

Kai Nakamura, Sharon Levy, William Yang Wang

PDF

Open Access 3 Repos 1 Models

TL;DR

This paper introduces Fakeddit, a comprehensive multimodal dataset with over one million samples, enabling advanced fine-grained fake news detection using text and image data.

Contribution

It provides the first large-scale, multimodal, fine-grained fake news dataset with detailed labels, facilitating improved detection models.

Findings

01

Multimodal models outperform text-only approaches.

02

Fine-grained classification improves detection accuracy.

03

Dataset enables diverse fake news research.

Abstract

Fake news has altered society in negative ways in politics and culture. It has adversely affected both online social network systems as well as offline communities and conversations. Using automatic machine learning classification models is an efficient way to combat the widespread dissemination of fake news. However, a lack of effective, comprehensive datasets has been a problem for fake news research and detection model development. Prior fake news datasets do not provide multimodal text and image data, metadata, comment data, and fine-grained fake news categorization at the scale and breadth of our dataset. We present Fakeddit, a novel multimodal dataset consisting of over 1 million samples from multiple categories of fake news. After being processed through several stages of review, the samples are labeled according to 2-way, 3-way, and 6-way classification categories through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
fabiszn/bert-base-fakeedit
model· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Advanced Malware Detection Techniques