Large-Scale Vandalism Detection with Linear Classifiers - The   Conkerberry Vandalism Detector at WSDM Cup 2017

Alexey Grigorev (Searchmetrics GmbH)

arXiv:1712.06920·cs.IR·December 20, 2017·1 cites

Large-Scale Vandalism Detection with Linear Classifiers - The Conkerberry Vandalism Detector at WSDM Cup 2017

Alexey Grigorev (Searchmetrics GmbH)

PDF

Open Access

TL;DR

This paper presents a fast and effective linear classifier-based system for detecting vandalism in Wikidata, achieving high accuracy and winning second place in the WSDM Cup 2017 challenge.

Contribution

The paper demonstrates that simple linear classifiers can effectively detect vandalism in large-scale knowledge bases, offering a fast and competitive solution.

Findings

01

Achieved AU ROC of 0.938 on test data.

02

Significantly faster than other approaches.

03

Provided an accessible implementation on GitHub.

Abstract

Nowadays many artificial intelligence systems rely on knowledge bases for enriching the information they process. Such Knowledge Bases are usually difficult to obtain and therefore they are crowdsourced: they are available for everyone on the internet to suggest edits and add new information. Unfortunately, they are sometimes targeted by vandals who put inaccurate or offensive information there. This is especially bad for the systems that use these Knowledge Bases: for them it is important to use reliable information to make correct inferences. One of such knowledge bases is Wikidata, and to fight vandals the organizers of WSDM Cup 2017 challenged participants to build a model for detecting mistrustful edits. In this paper we present the second place solution to the cup: we show that it is possible to achieve competitive performance with simple linear classification. With our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Digital Media Forensic Detection · Hate Speech and Cyberbullying Detection