Entity Linking and Discovery via Arborescence-based Supervised Clustering
Dhruv Agarwal, Rico Angell, Nicholas Monath, Andrew McCallum

TL;DR
This paper introduces a novel arborescence-based clustering method for entity linking and discovery, leveraging mention-to-mention affinities to improve accuracy and efficiency across multiple datasets.
Contribution
It presents new training and inference procedures that utilize mention-to-mention affinities via minimum arborescences, extending to entity discovery and improving performance and efficiency.
Findings
Significant performance improvements on Zero-Shot Entity Linking and MedMentions datasets.
Enhanced efficiency with minimal accuracy loss compared to previous models.
Effective clustering of mentions without known entities in the knowledge base.
Abstract
Previous work has shown promising results in performing entity linking by measuring not only the affinities between mentions and entities but also those amongst mentions. In this paper, we present novel training and inference procedures that fully utilize mention-to-mention affinities by building minimum arborescences (i.e., directed spanning trees) over mentions and entities across documents in order to make linking decisions. We also show that this method gracefully extends to entity discovery, enabling the clustering of mentions that do not have an associated entity in the knowledge base. We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset, and show significant improvements in performance for both entity linking and discovery compared to identically parameterized models. We further show significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
