In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs

Ndeye-Emilie Mbengue (WIMMICS)

arXiv:2605.05931·cs.AI·May 8, 2026

In Data or Invisible: Toward a Better Digital Representation of Low-Resource Languages with Knowledge Graphs

Ndeye-Emilie Mbengue (WIMMICS)

PDF

TL;DR

This research aims to improve digital representation of low-resource languages in knowledge graphs by analyzing language distribution and exploring cross-lingual transfer and analogical reasoning techniques.

Contribution

It provides a detailed analysis of language coverage in major LOD knowledge graphs and proposes new strategies for multilingual KG completion involving linguistic proximity.

Findings

01

Analyzed language distribution in DBpedia, BabelNet, and Wikidata.

02

Identified potential benefits of cross-lingual transfer for KG completion.

03

Proposed leveraging analogical reasoning to improve language coverage.

Abstract

Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from participating in the global digital transformation. In this PhD proposal, we aim to address this gap, focusing on the language coverage of Linked Open Data knowledge graphs (LOD KGs). First, we identify key variables that characterize language distribution in LOD, including the number of Wikipedia articles per language edition and the number of language-tagged entities in LOD KGs. These variables are analyzed across three major multilingual LOD KGs, DBpedia, BabelNet, and Wikidata, providing insights into the representation and distribution of languages within LOD. Building on this analysis, we intend to study the impact of cross-lingual transfer candidate selection on the task of multilingual KG completion. In particular,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.