VEWS: A Wikipedia Vandal Early Warning System

Srijan Kumar; Francesca Spezzano; V.S. Subrahmanian

arXiv:1507.01272·cs.SI·July 7, 2015

VEWS: A Wikipedia Vandal Early Warning System

Srijan Kumar, Francesca Spezzano, V.S. Subrahmanian

PDF

TL;DR

This paper introduces VEWS, an early vandal detection system for Wikipedia that uses novel features and machine learning approaches to identify vandals with high accuracy before traditional methods do.

Contribution

The paper develops three innovative feature sets and combines them into VEWS, achieving over 85% accuracy in vandal detection without relying on user-reported reverts.

Findings

01

VEWS outperforms ClueBot NG and STiki in accuracy.

02

VEWS detects vandals on average 2.39 edits earlier than ClueBot NG.

03

Combining VEWS with ClueBot NG yields an even more accurate early warning system.

Abstract

We study the problem of detecting vandals on Wikipedia before any human or known vandalism detection system reports flagging potential vandals so that such users can be presented early to Wikipedia administrators. We leverage multiple classical ML approaches, but develop 3 novel sets of features. Our Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing patterns as features to classify some users as vandals. Our Wikipedia Transition Probability Matrix (WTPM) approach uses a set of features derived from a transition probability matrix and then reduces it via a neural net auto-encoder to classify some users as vandals. The VEWS approach merges the previous two approaches. Without using any information (e.g. reverts) provided by other users, these algorithms each have over 85% classification accuracy. Moreover, when temporal recency is considered, accuracy goes to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.