High-Throughput and Language-Agnostic Entity Disambiguation and Linking on User Generated Data
Preeti Bhargava, Nemanja Spasojevic, Guoning Hu

TL;DR
This paper introduces Lithium, a high-throughput, lightweight, language-agnostic entity disambiguation and linking system that outperforms existing methods in accuracy and speed on user-generated data.
Contribution
The paper presents Lithium, a novel EDL system that is scalable, language-agnostic, and more accurate and faster than current state-of-the-art systems.
Findings
Disambiguates 75% more entities than existing systems.
Significantly faster processing speed.
Effective on multi-lingual user-generated data.
Abstract
The Entity Disambiguation and Linking (EDL) task matches entity mentions in text to a unique Knowledge Base (KB) identifier such as a Wikipedia or Freebase id. It plays a critical role in the construction of a high quality information network, and can be further leveraged for a variety of information retrieval and NLP tasks such as text categorization and document tagging. EDL is a complex and challenging problem due to ambiguity of the mentions and real world text being multi-lingual. Moreover, EDL systems need to have high throughput and should be lightweight in order to scale to large datasets and run on off-the-shelf machines. More importantly, these systems need to be able to extract and disambiguate dense annotations from the data in order to enable an Information Retrieval or Extraction task running on the data to be more efficient and accurate. In order to address all these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Semantic Web and Ontologies
