# Building automated vandalism detection tools for Wikidata

**Authors:** Amir Sarabadani, Aaron Halfaker, Dario Taraborelli

arXiv: 1703.03861 · 2017-03-14

## TL;DR

This paper develops machine learning tools to detect vandalism in Wikidata, achieving high accuracy and significantly reducing patrollers' workload by leveraging user and edit features.

## Contribution

It introduces a novel feature engineering approach tailored for structured data in Wikidata and demonstrates its effectiveness in vandalism detection.

## Key findings

- Detects 89% of vandalism in Wikidata
- Reduces patrollers' workload by 98%
- Uses user and edit characteristic features

## Abstract

Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open collaboration model is powerful in that it reduces barriers to participation and allows a large number of people to contribute. However, it exposes the knowledge base to the risk of vandalism and low-quality contributions. In this work, we build on past work detecting vandalism in Wikipedia to detect vandalism in Wikidata. This work is novel in that identifying damaging changes in a structured knowledge-base requires substantially different feature engineering work than in a text-based wiki like Wikipedia. We also discuss the utility of these classifiers for reducing the overall workload of vandalism patrollers in Wikidata. We describe a machine classification strategy that is able to catch 89% of vandalism while reducing patrollers' workload by 98%, by drawing lightly from contextual features of an edit and heavily from the characteristics of the user making the edit.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.03861/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1703.03861/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1703.03861/full.md

---
Source: https://tomesphere.com/paper/1703.03861