Robustness Evaluation of Entity Disambiguation Using Prior Probes:the Case of Entity Overshadowing
Vera Provatorova, Svitlana Vakulenko, Samarth Bhargav, Evangelos, Kanoulas

TL;DR
This paper introduces the ShadowLink dataset to evaluate entity disambiguation systems more accurately, revealing significant biases caused by entity overshadowing and prior probability effects.
Contribution
The paper presents a new benchmark dataset, ShadowLink, to assess entity disambiguation models beyond prior bias, highlighting the impact of entity overshadowing.
Findings
Performance varies significantly between common and rare entities.
Prior probability bias inflates accuracy scores on traditional datasets.
Entity overshadowing affects disambiguation accuracy across models.
Abstract
Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was previously shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Semantic Web and Ontologies
