TL;DR
This study investigates how linguistic differences across Wikipedia language editions influence the information available on various topics, revealing potential biases and the impact on online learning and knowledge access.
Contribution
It introduces a hybrid computational model to quantify similarities and differences across Wikipedia language editions, linking linguistic bias to educational and informational disparities.
Findings
Identifies significant linguistic biases in Wikipedia content across languages.
Develops a model to measure intra- and intertextual similarities between language editions.
Highlights implications for online learning and knowledge equity.
Abstract
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
