Harnessing Historical Corrections to build Test Collections for Named   Entity Disambiguation

Florian Reitz

arXiv:1808.08999·cs.DL·August 29, 2018

Harnessing Historical Corrections to build Test Collections for Named Entity Disambiguation

Florian Reitz

PDF

TL;DR

This paper introduces a method to generate large, cost-effective test collections for name disambiguation by leveraging historical metadata, demonstrated on the DBLP dataset.

Contribution

It presents a novel approach to create extensive test collections from historical data, addressing the scarcity and specificity of existing datasets.

Findings

01

Created two large test collections from DBLP metadata

02

One collection analyzes defect properties, the other evaluates disambiguation algorithms

03

Test collections are freely available for research use

Abstract

Matching mentions of persons to the actual persons (the name disambiguation problem) is central for several digital library applications. Scientists have been working on algorithms to create this matching for decades without finding a universal solution. One problem is that test collections for this problem are often small and specific to a certain collection. In this work, we present an approach that can create large test collections from historical metadata with minimal extra cost. We apply this approach to the DBLP collection to generate two freely available test collections. One collection focuses on the properties of defects and one on the evaluation of disambiguation algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.