Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty
Renato Stoffalette Jo\~ao, Pavlos Fafalios, Stefan Dietze

TL;DR
This paper introduces a method to predict the difficulty of entity linking in texts, helping improve semi-automated systems by identifying challenging mentions and understanding factors influencing linking performance.
Contribution
It proposes a consensus-based approach to label mention difficulty and trains a classifier to predict difficulty, revealing latent features affecting entity linking accuracy.
Findings
High accuracy in predicting EL difficulty.
Latent corpus-specific features influence EL performance.
Method improves semi-automated EL pipelines.
Abstract
Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real applications. A priori approximations of the difficulty to link a particular entity mention can facilitate flagging of critical cases as part of semi-automated EL systems, while detecting latent factors that affect the EL performance, like corpus-specific features, can provide insights on how to improve a system based on the special characteristics of the underlying corpus. In this paper, we first introduce a consensus-based method to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Text Readability and Simplification
