Efficient Entity Candidate Generation for Low-Resource Languages
Alberto Garc\'ia-Dur\'an, Akhil Arora, Robert West

TL;DR
This paper presents a simple, efficient method for candidate generation in low-resource language entity linking, outperforming complex neural approaches in accuracy and speed across multiple datasets.
Contribution
It introduces a lightweight indexing approach tailored for low-resource languages and provides an in-depth analysis of query difficulty and evaluation limitations.
Findings
Our method outperforms state-of-the-art approaches in most datasets.
It improves both quality and efficiency of candidate generation.
The approach is effective across diverse low-resource language datasets.
Abstract
Candidate generation is a crucial module in entity linking. It also plays a key role in multiple NLP tasks that have been proven to beneficially leverage knowledge bases. Nevertheless, it has often been overlooked in the monolingual English entity linking literature, as naive approaches obtain very good performance. Unfortunately, the existing approaches for English cannot be successfully transferred to poorly resourced languages. This paper constitutes an in-depth analysis of the candidate generation problem in the context of cross-lingual entity linking with a focus on low-resource languages. Among other contributions, we point out limitations in the evaluation conducted in previous works. We introduce a characterization of queries into types based on their difficulty, which improves the interpretability of the performance of different methods. We also propose a light-weight and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
