Which Are the Low-Resource Languages of the Semantic Web?
Ndeye-Emilie Mbengue (WIMMICS), Pierre Monnin (WIMMICS), Miguel Couceiro (INESC-ID), Fabien Gandon (WIMMICS)

TL;DR
This paper proposes a methodology and categorization to identify low-resource languages within Linked Open Data Knowledge Graphs, aiming to address digital divide issues in multilingual digital technologies.
Contribution
It introduces a formal definition of low-resource languages in LOD KGs based on a new categorization scheme using DBpedia, BabelNet, and Wikidata.
Findings
Proposed a multi-level categorization of languages in LOD KGs.
Provided a formal definition of low-resource languages.
Enabled selection of cross-lingual transfer candidates.
Abstract
Emerging digital technologies are exacerbating the existing divide in Open Access Data (OAD) between high-and low-resource languages, excluding many communities from the global digital transformation. Multilingual Linked Open Data Knowledge Graphs (LOD KGs) could contribute to mitigating this divide through cross-lingual transfer; however, no clear quantitative definition of low-resource languages has yet been established in the context of LOD KGs. In this poster, we present a methodology to analyze the distribution of languages across LOD KGs and propose a preliminary multi-level categorization based on DBpedia, BabelNet, and Wikidata. This categorization is leveraged to bring a formal definition of low-, high-, and medium-resource languages that could be later leveraged to select cross-lingual transfer candidates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
