Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia
Farhan Samir, Chan Young Park, Anjalie Field, Vered Shwartz, Yulia, Tsvetkov

TL;DR
This paper introduces the InfoGap method for detecting factual information gaps and inconsistencies across languages in Wikipedia articles, revealing significant discrepancies in LGBT portrayals across English, Russian, and French editions.
Contribution
The paper presents a novel, scalable approach for fact-level comparison of articles across languages, enabling nuanced analysis of systematic biases and information gaps.
Findings
Large discrepancies in factual coverage across languages.
Russian Wikipedia emphasizes negatively connoted facts.
InfoGap enables large-scale, detailed cross-language analysis.
Abstract
To explain social phenomena and identify systematic biases, much research in computational social science focuses on comparative text analyses. These studies often rely on coarse corpus-level statistics or local word-level analyses, mainly in English. We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level, across languages. We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias. We find large discrepancies in factual coverage across the languages. Moreover, our analysis reveals that biographical facts carrying negative connotations are more likely to be highlighted in Russian Wikipedia. Crucially, InfoGap both facilitates large scale analyses, and pinpoints local document- and fact-level information gaps, laying a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
