Automatically Labeling Low Quality Content on Wikipedia by Leveraging Patterns in Editing Behaviors
Sumit Asthana, Sabrina Tobar Thommel, Aaron Lee Halfaker, Nikola, Banovic

TL;DR
This paper introduces an automated method to label low-quality Wikipedia content by analyzing editing patterns, improving sentence quality classification, and leveraging editor behavior for more accurate labels.
Contribution
It presents a novel automated labeling approach based on editing behaviors, enhancing machine learning models for Wikipedia content quality assessment.
Findings
Training on behavior-based labels improves classification accuracy.
Behavioral labels outperform crowdworker-generated labels.
Automated labeling reduces manual effort and noise.
Abstract
Wikipedia articles aim to be definitive sources of encyclopedic content. Yet, only 0.6% of Wikipedia articles have high quality according to its quality scale due to insufficient number of Wikipedia editors and enormous number of articles. Supervised Machine Learning (ML) quality improvement approaches that can automatically identify and fix content issues rely on manual labels of individual Wikipedia sentence quality. However, current labeling approaches are tedious and produce noisy labels. Here, we propose an automated labeling approach that identifies the semantic category (e.g., adding citations, clarifications) of historic Wikipedia edits and uses the modified sentences prior to the edit as examples that require that semantic improvement. Highest-rated article sentences are examples that no longer need semantic improvements. We show that training existing sentence quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Cancer-related gene regulation · Natural Language Processing Techniques
