Harnessing Historical Corrections to build Test Collections for Named Entity Disambiguation
Florian Reitz

TL;DR
This paper introduces a method to generate large, cost-effective test collections for name disambiguation by leveraging historical metadata, demonstrated on the DBLP dataset.
Contribution
It presents a novel approach to create extensive test collections from historical data, addressing the scarcity and specificity of existing datasets.
Findings
Created two large test collections from DBLP metadata
One collection analyzes defect properties, the other evaluates disambiguation algorithms
Test collections are freely available for research use
Abstract
Matching mentions of persons to the actual persons (the name disambiguation problem) is central for several digital library applications. Scientists have been working on algorithms to create this matching for decades without finding a universal solution. One problem is that test collections for this problem are often small and specific to a certain collection. In this work, we present an approach that can create large test collections from historical metadata with minimal extra cost. We apply this approach to the DBLP collection to generate two freely available test collections. One collection focuses on the properties of defects and one on the evaluation of disambiguation algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
