Proceedings of the WSDM Cup 2017: Vandalism Detection and Triple Scoring
Martin Potthast (1), Stefan Heindorf (2), Hannah Bast (3) ((1) Leipzig, University, (2) Paderborn University, (3) University of Freiburg)

TL;DR
The WSDM Cup 2017 challenged participants to develop methods for vandalism detection in Wikidata and triple scoring for entity relevance, promoting reproducibility and open-source sharing in knowledge base quality assurance.
Contribution
This paper presents the datasets, tasks, and evaluation framework of the WSDM Cup 2017 for vandalism detection and triple scoring, encouraging reproducible research in knowledge base quality.
Findings
Participants submitted diverse approaches to vandalism detection.
Open-source solutions were encouraged and shared.
The challenge facilitated advancements in knowledge base quality assessment.
Abstract
The WSDM Cup 2017 was a data mining challenge held in conjunction with the 10th International Conference on Web Search and Data Mining (WSDM). It addressed key challenges of knowledge bases today: quality assurance and entity search. For quality assurance, we tackle the task of vandalism detection, based on a dataset of more than 82 million user-contributed revisions of the Wikidata knowledge base, all of which annotated with regard to whether or not they are vandalism. For entity search, we tackle the task of triple scoring, using a dataset that comprises relevance scores for triples from type-like relations including occupation and country of citizenship, based on about 10,000 human relevance judgements. For reproducibility sake, participants were asked to submit their software on TIRA, a cloud-based evaluation platform, and they were incentivized to share their approaches open source.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
