The Detection and effect of social events on Wikipedia data-set for studying human preferences
Julien Assuied, Y\'erali Gandica

TL;DR
This paper investigates how social events influence Wikipedia data, proposing a method to detect outliers across multiple languages and categories, and finds that such outliers do not significantly bias analyses of human preferences.
Contribution
It introduces a language-dependent outlier detection method for Wikipedia data and evaluates the impact of social events on data bias.
Findings
Outliers do not significantly affect Wikipedia-based human preference analysis.
A new methodology for detecting social event outliers in multilingual Wikipedia data.
Analysis of cyclic human behavior in Wikipedia editing patterns.
Abstract
Several studies have used Wikipedia (WP) data-set to analyse worldwide human preferences by languages. However, those studies could suffer from bias related to exceptional social circumstances. Any massive event promoting the exceptional edition of WP can be defined as a source of bias. In this article, we follow a procedure for detecting outliers. Our study is based on languages and different categories. Our methodology defines a parameter, which is language-depending instead of being externally fixed. We also study the presence of human cyclic behaviour to evaluate apparent outliers. After our analysis, we found that the outliers in our data set do not significantly affect using the whole Wikipedia-data set as a digital footprint to analyse worldwide human preferences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · RNA and protein synthesis mechanisms
