WikiGUM: Exhaustive Entity Linking for Wikification in 12 Genres
Jessica Lin, Amir Zeldes

TL;DR
WikiGUM is a comprehensive dataset for entity linking across 12 diverse genres, including nested and pronominal mentions, enabling improved research and evaluation in Wikification tasks.
Contribution
The paper introduces WikiGUM, a fully annotated, genre-diverse dataset covering all mention types, addressing gaps in existing resources for entity linking research.
Findings
Pretrained SOTA systems perform poorly on the new dataset.
The dataset includes nested, non-named, and pronominal mentions.
Provides a resource for further research on entities in varied contexts.
Abstract
Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i.e. Wikification. In this paper, we present and evaluate WikiGUM, a fully wikified dataset, covering all mentions of named entities, including their non-named and pronominal mentions, as well as mentions nested within other mentions. The dataset covers a broad range of 12 written and spoken genres, most of which have not been included in Entity Linking efforts to date, leading to poor performance by a pretrained SOTA system in our evaluation. The availability of a variety of other annotations for the same data also enables further research on entities in context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration
