Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs
Simone Conia, Min Li, Daniel Lee, Umar Farooq Minhas, Ihab, Ilyas, Yunyao Li

TL;DR
This paper introduces the task of enhancing multilingual knowledge graphs by increasing the coverage and quality of non-English textual data, proposing a new unsupervised method and a benchmark to evaluate progress.
Contribution
It presents M-NTA, a novel unsupervised approach combining MT, Web Search, and LLMs, and introduces WikiKGE-10, a benchmark for multilingual knowledge graph evaluation.
Findings
State-of-the-art methods struggle with multilingual coverage.
M-NTA improves the quality of non-English textual information.
Enhanced multilingual data benefits entity linking, KG completion, and QA.
Abstract
Recent work in Natural Language Processing and Computer Vision has been using textual information -- e.g., entity names and descriptions -- available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Enhancement (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. More specifically, we: i) bring to light the problem of increasing multilingual coverage and precision of entity names and descriptions in Wikidata; ii) demonstrate that state-of-the-art methods, namely, Machine Translation (MT), Web Search (WS), and Large Language Models (LLMs), struggle with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
