In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs
Ndeye-Emilie Mbengue (WIMMICS)

TL;DR
This research aims to improve digital representation of low-resource languages in knowledge graphs by analyzing language distribution and exploring cross-lingual transfer and analogical reasoning techniques.
Contribution
It provides a detailed analysis of language coverage in major LOD knowledge graphs and proposes new strategies for multilingual KG completion involving linguistic proximity.
Findings
Analyzed language distribution in DBpedia, BabelNet, and Wikidata.
Identified potential benefits of cross-lingual transfer for KG completion.
Proposed leveraging analogical reasoning to improve language coverage.
Abstract
Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from participating in the global digital transformation. In this PhD proposal, we aim to address this gap, focusing on the language coverage of Linked Open Data knowledge graphs (LOD KGs). First, we identify key variables that characterize language distribution in LOD, including the number of Wikipedia articles per language edition and the number of language-tagged entities in LOD KGs. These variables are analyzed across three major multilingual LOD KGs, DBpedia, BabelNet, and Wikidata, providing insights into the representation and distribution of languages within LOD. Building on this analysis, we intend to study the impact of cross-lingual transfer candidate selection on the task of multilingual KG completion. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
