Namesakes: Ambiguously Named Entities from Wikipedia and News
Oleg Vasilyev, Aysu Altun, Nidhi Vyas, Vedant Dharnidharka, Erika Lam,, John Bohannon

TL;DR
Namesakes introduces a large dataset of ambiguously named entities from Wikipedia and news, designed to improve the evaluation of named entity linking systems by providing challenging benchmarks.
Contribution
The paper presents a new dataset of ambiguously named entities from Wikipedia and news, facilitating better evaluation and development of named entity linking methods.
Findings
Dataset contains 58,862 mentions of 4,148 entities.
Includes mentions from news and Wikipedia, covering diverse contexts.
Aims to serve as a benchmark for NEL tasks.
Abstract
We present Namesakes, a dataset of ambiguously named entities obtained from English-language Wikipedia and news articles. It consists of 58862 mentions of 4148 unique entities and their namesakes: 1000 mentions from news, 28843 from Wikipedia articles about the entity, and 29019 Wikipedia backlink mentions. Namesakes should be helpful in establishing challenging benchmarks for the task of named entity linking (NEL).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Wikis in Education and Collaboration
