Strong Heuristics for Named Entity Linking
Marko \v{C}uljak, Andreas Spitz, Robert West, Akhil Arora

TL;DR
This paper introduces simple, scalable heuristics for named entity linking in large news corpora, achieving high disambiguation accuracy and outperforming some state-of-the-art unsupervised methods.
Contribution
It proposes lightweight heuristics for entity disambiguation that are effective, scalable, and competitive with advanced unsupervised and zero-shot approaches.
Findings
Disambiguates 94% of mentions in Quotebank
Achieves 63% disambiguation on AIDA-CoNLL
Outperforms Eigenthemes and mGENRE in benchmarks
Abstract
Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods. However, such methods tend to come with caveats, such as no integration of suitable knowledge bases (like Wikidata) for emerging entities, a lack of scalability, and poor interpretability. Here, we consider person disambiguation in Quotebank, a massive corpus of speaker-attributed quotations from the news, and investigate the suitability of intuitive, lightweight, and scalable heuristics for NEL in web-scale corpora. Our best performing heuristic disambiguates 94% and 63% of the mentions on Quotebank and the AIDA-CoNLL benchmark, respectively. Additionally, the proposed heuristics compare favourably to the state-of-the-art unsupervised and zero-shot methods, Eigenthemes and mGENRE, respectively, thereby…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
