Robust Candidate Generation for Entity Linking on Short Social Media Texts
Liam Hebert, Raheleh Makki, Shubhanshu Mishra, Hamidreza, Saghir, Anusha Kamath, Yuval Merhav

TL;DR
This paper addresses the challenges of entity linking in Tweets by evaluating dense retrieval and lookup methods, proposing a hybrid approach with Wikipedia context that significantly improves recall.
Contribution
It introduces a hybrid candidate generation method combining dense retrieval and Wikipedia context to enhance entity linking in social media texts.
Findings
Dense retrieval alone underperforms on Tweets due to informal language.
A hybrid approach with Wikipedia context achieves 0.93 recall.
The study provides empirical evaluation on a large Tweets benchmark.
Abstract
Entity Linking (EL) is the gateway into Knowledge Bases. Recent advances in EL utilize dense retrieval approaches for Candidate Generation, which addresses some of the shortcomings of the Lookup based approach of matching NER mentions against pre-computed dictionaries. In this work, we show that in the domain of Tweets, such methods suffer as users often include informal spelling, limited context, and lack of specificity, among other issues. We investigate these challenges on a large and recent Tweets benchmark for EL, empirically evaluate lookup and dense retrieval approaches, and demonstrate a hybrid solution using long contextual representation from Wikipedia is necessary to achieve considerable gains over previous work, achieving 0.93 recall.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
