TL;DR
This paper introduces Wiki-Reliability, a large-scale dataset of Wikipedia articles annotated with reliability issues using templates, enabling machine learning models to predict content reliability at scale.
Contribution
The paper presents the first large-scale dataset of Wikipedia content annotated with reliability issues, leveraging templates for labeling nearly 1 million revisions.
Findings
Dataset enables training of large-scale reliability prediction models
Annotations based on expert-indicated templates improve reliability detection
Public release of data and code facilitates further research
Abstract
Wikipedia is the largest online encyclopedia, used by algorithms and web users as a central hub of reliable information on the web. The quality and reliability of Wikipedia content is maintained by a community of volunteer editors. Machine learning and information retrieval algorithms could help scale up editors' manual efforts around Wikipedia content reliability. However, there is a lack of large-scale data to support the development of such research. To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues. To build this dataset, we rely on Wikipedia "templates". Templates are tags used by expert Wikipedia editors to indicate content issues, such as the presence of "non-neutral point of view" or "contradictory articles", and serve as a strong signal for detecting reliability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
