Early Detection of Social Media Hoaxes at Scale

Arkaitz Zubiaga; Aiqi Jiang

arXiv:1801.07311·cs.CL·June 16, 2020·1 cites

Early Detection of Social Media Hoaxes at Scale

Arkaitz Zubiaga, Aiqi Jiang

PDF

Open Access

TL;DR

This paper presents a semi-automated approach to detect social media hoaxes early by creating a large-scale dataset using Wikidata, enabling more effective training and evaluation of detection models.

Contribution

It introduces a novel semi-automated method leveraging Wikidata to build large datasets for early hoax detection on social media, focusing on celebrity death reports.

Findings

01

Achieved F1 scores near 72% within 10 minutes of the first tweet.

02

Created a dataset with over 13 million tweets and 4,007 reports.

03

Demonstrated the importance of training data size for early detection accuracy.

Abstract

The unmoderated nature of social media enables the diffusion of hoaxes, which in turn jeopardises the credibility of information gathered from social media platforms. Existing research on automated detection of hoaxes has the limitation of using relatively small datasets, owing to the difficulty of getting labelled data. This in turn has limited research exploring early detection of hoaxes as well as exploring other factors such as the effect of the size of the training data or the use of sliding windows. To mitigate this problem, we introduce a semi-automated method that leverages the Wikidata knowledge base to build large-scale datasets for veracity classification, focusing on celebrity death reports. This enables us to create a dataset with 4,007 reports including over 13 million tweets, 15% of which are fake. Experiments using class-specific representations of word embeddings show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Topic Modeling