Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset

Hsuvas Borkakoty; Luis Espinosa-Anke

arXiv:2405.02175·cs.CL·September 2, 2024

Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset

Hsuvas Borkakoty, Luis Espinosa-Anke

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces Hoaxpedia, a dataset of 311 Wikipedia hoax articles with legitimate counterparts, and analyzes language models and features for automated hoax detection, highlighting the difficulty and potential of content-based detection methods.

Contribution

The paper provides the first systematic analysis of Wikipedia hoaxes, introduces the Hoaxpedia dataset, and evaluates language models and features for automated hoax detection.

Findings

01

Content-based detection is challenging but feasible.

02

Edit history features improve classification accuracy.

03

Full article analysis yields better results than just definitions.

Abstract

Hoaxes are a recognised form of disinformation created deliberately, with potential serious implications in the credibility of reference knowledge resources such as Wikipedia. What makes detecting Wikipedia hoaxes hard is that they often are written according to the official style guidelines. In this work, we first provide a systematic analysis of similarities and discrepancies between legitimate and hoax Wikipedia articles, and introduce Hoaxpedia, a collection of 311 hoax articles (from existing literature and official Wikipedia lists), together with semantically similar legitimate articles, which together form a binary text classification dataset aimed at fostering research in automated hoax detection. In this paper, We report results after analyzing several language models, hoax-to-legit ratios, and the amount of text classifiers are exposed to (full article vs the article's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hsuvas/hoaxpedia_dataset
pytorchOfficial

Datasets

hsuvaskakoty/hoaxpedia
dataset· 3.1k dl
3.1k dl

Videos

HOAXPEDIA: A Unified Wikipedia Hoax Articles Dataset· underline

Taxonomy

TopicsWikis in Education and Collaboration