Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia
Tom\'as Feith, Akhil Arora, Martin Gerlach, Debjit Paul, Robert West

TL;DR
This paper introduces the task of entity insertion in multilingual Wikipedia, presents a benchmark dataset, and develops a model that outperforms baselines, aiding editors in adding links across many languages.
Contribution
It defines and operationalizes entity insertion in multilingual networks, creates a benchmark dataset in 105 languages, and proposes LocEI and XLocEI models that outperform existing methods.
Findings
XLocEI outperforms baseline models including GPT-4.
The models can be applied in zero-shot settings with minimal performance loss.
The approach supports cross-language entity insertion in Wikipedia.
Abstract
Links are a fundamental part of information networks, turning isolated pieces of knowledge into a network of information that is much richer than the sum of its parts. However, adding a new link to the network is not trivial: it requires not only the identification of a suitable pair of source and target entities but also the understanding of the content of the source to locate a suitable position for the link in the text. The latter problem has not been addressed effectively, particularly in the absence of text spans in the source that could serve as anchors to insert a link to the target entity. To bridge this gap, we introduce and operationalize the task of entity insertion in information networks. Focusing on the case of Wikipedia, we empirically show that this problem is, both, relevant and challenging for editors. We compile a benchmark dataset in 105 languages and develop a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Topic Modeling
