Fair multilingual vandalism detection system for Wikipedia

Mykola Trokhymovych; Muniza Aslam; Ai-Jou Chou; Ricardo Baeza-Yates,; and Diego Saez-Trumper

arXiv:2306.01650·cs.LG·June 5, 2023·1 cites

Fair multilingual vandalism detection system for Wikipedia

Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, Ricardo Baeza-Yates,, and Diego Saez-Trumper

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fair, multilingual vandalism detection system for Wikipedia that improves coverage, accuracy, and reduces bias across 47 languages, enhancing community moderation efforts.

Contribution

The paper presents a novel multilingual vandalism detection system using advanced filtering and masked language modeling, significantly expanding language coverage and outperforming existing tools.

Findings

01

Increased language coverage to 47 languages.

02

Outperforms existing Wikipedia vandalism detection system ORES.

03

Reduces bias against contributor groups.

Abstract

This paper presents a novel design of the system aimed at supporting the Wikipedia community in addressing vandalism on the platform. To achieve this, we collected a massive dataset of 47 languages, and applied advanced filtering and feature engineering techniques, including multilingual masked language modeling to build the training dataset from human-generated data. The performance of the system was evaluated through comparison with the one used in production in Wikipedia, known as ORES. Our research results in a significant increase in the number of languages covered, making Wikipedia patrolling more efficient to a wider range of communities. Furthermore, our model outperforms ORES, ensuring that the results provided are not only more accurate but also less biased against certain groups of contributors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trokhymovych/ki_multilingual_training
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWikis in Education and Collaboration · Cancer-related gene regulation · Protein Degradation and Inhibitors