Multilingual Entity Linking Using Dense Retrieval

Dominik Farhan

arXiv:2406.16892·cs.CL·June 26, 2024

Multilingual Entity Linking Using Dense Retrieval

Dominik Farhan

PDF

Open Access

TL;DR

This paper presents multilingual entity linking systems using dense retrieval that are fast to train, resource-efficient, and effective across nine languages, enhancing reproducibility and accessibility in the field.

Contribution

The authors develop resource-efficient multilingual entity linking models with detailed hyperparameter analysis, demonstrating competitive performance without large-scale GPU resources.

Findings

01

Achieved effective multilingual EL with limited resources.

02

Provided hyperparameter insights for bi-encoder training.

03

Evaluated models across 9 languages for comprehensive analysis.

Abstract

Entity linking (EL) is the computational process of connecting textual mentions to corresponding entities. Like many areas of natural language processing, the EL field has greatly benefited from deep learning, leading to significant performance improvements. However, present-day approaches are expensive to train and rely on diverse data sources, complicating their reproducibility. In this thesis, we develop multiple systems that are fast to train, demonstrating that competitive entity linking can be achieved without a large GPU cluster. Moreover, we train on a publicly available dataset, ensuring reproducibility and accessibility. Our models are evaluated for 9 languages giving an accurate overview of their strengths. Furthermore, we offer a~detailed analysis of bi-encoder training hyperparameters, a popular approach in EL, to guide their informed selection. Overall, our work shows that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Text and Document Classification Technologies · Topic Modeling