Wiki-Reliability: A Large Scale Dataset for Content Reliability on   Wikipedia

KayYen Wong; Miriam Redi; Diego Saez-Trumper

arXiv:2105.04117·cs.IR·June 2, 2021

Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia

KayYen Wong, Miriam Redi, Diego Saez-Trumper

PDF

1 Repo

TL;DR

This paper introduces Wiki-Reliability, a large-scale dataset of Wikipedia articles annotated with reliability issues using templates, enabling machine learning models to predict content reliability at scale.

Contribution

The paper presents the first large-scale dataset of Wikipedia content annotated with reliability issues, leveraging templates for labeling nearly 1 million revisions.

Findings

01

Dataset enables training of large-scale reliability prediction models

02

Annotations based on expert-indicated templates improve reliability detection

03

Public release of data and code facilitates further research

Abstract

Wikipedia is the largest online encyclopedia, used by algorithms and web users as a central hub of reliable information on the web. The quality and reliability of Wikipedia content is maintained by a community of volunteer editors. Machine learning and information retrieval algorithms could help scale up editors' manual efforts around Wikipedia content reliability. However, there is a lack of large-scale data to support the development of such research. To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues. To build this dataset, we rely on Wikipedia "templates". Templates are tags used by expert Wikipedia editors to indicate content issues, such as the presence of "non-neutral point of view" or "contradictory articles", and serve as a strong signal for detecting reliability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kay-wong/Wiki-Reliability
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.