EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery and Indexing
Nora Kassner, Fabio Petroni, Mikhail Plekhanov, Sebastian Riedel,, Nicola Cancedda

TL;DR
This paper introduces EDIN, a comprehensive benchmark and pipeline for discovering and indexing unknown entities in entity linking systems, addressing the challenge of incomplete knowledge bases and novel concepts.
Contribution
It presents the EDIN benchmark and an end-to-end pipeline for detecting, clustering, and indexing unknown entities, advancing beyond zero-shot linking methods.
Findings
Indexing a single embedding per entity improves performance.
The EDIN pipeline effectively detects and clusters unknown entities.
Experiments highlight the challenges and solutions for unknown entity integration.
Abstract
Existing work on Entity Linking mostly assumes that the reference knowledge base is complete, and therefore all mentions can be linked. In practice this is hardly ever the case, as knowledge bases are incomplete and because novel concepts arise constantly. This paper created the Unknown Entity Discovery and Indexing (EDIN) benchmark where unknown entities, that is entities without a description in the knowledge base and labeled mentions, have to be integrated into an existing entity linking system. By contrasting EDIN with zero-shot entity linking, we provide insight on the additional challenges it poses. Building on dense-retrieval based entity linking, we introduce the end-to-end EDIN pipeline that detects, clusters, and indexes mentions of unknown entities in context. Experiments show that indexing a single embedding per entity unifying the information of multiple mentions works…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Semantic Web and Ontologies
MethodsBalanced Selection
