Distributed Entity Disambiguation with Per-Mention Learning
Tiep Mai, Bichen Shi, Patrick K. Nicholson, Deepak Ajwani, Alessandra, Sala

TL;DR
This paper introduces a novel entity disambiguation system that learns specialized models for each ambiguous phrase, leveraging large-scale Wikipedia data, achieving high accuracy and efficient updates suitable for real-time applications.
Contribution
The paper presents a per-mention learning approach for entity disambiguation, enabling fast updates and high accuracy compared to existing global ranking models.
Findings
Achieves competitive accuracy with state-of-the-art systems.
Supports distributed training over large datasets.
Allows quick updates for new or specialized entities.
Abstract
Entity disambiguation, or mapping a phrase to its canonical representation in a knowledge base, is a fundamental step in many natural language processing applications. Existing techniques based on global ranking models fail to capture the individual peculiarities of the words and hence, either struggle to meet the accuracy requirements of many real-world applications or they are too complex to satisfy real-time constraints of applications. In this paper, we propose a new disambiguation system that learns specialized features and models for disambiguating each ambiguous phrase in the English language. To train and validate the hundreds of thousands of learning models for this purpose, we use a Wikipedia hyperlink dataset with more than 170 million labelled annotations. We provide an extensive experimental evaluation to show that the accuracy of our approach compares favourably with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
