Estimating the Performance of Entity Resolution Algorithms: Lessons   Learned Through PatentsView.org

Olivier Binette; Sokhna A York; Emma Hickerson; Youngsoo Baek; Sarvo; Madhavan; Christina Jones

arXiv:2210.01230·cs.DL·April 19, 2023·1 cites

Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

Olivier Binette, Sokhna A York, Emma Hickerson, Youngsoo Baek, Sarvo, Madhavan, Christina Jones

PDF

Open Access 3 Repos

TL;DR

This paper presents a new evaluation method for entity resolution algorithms, specifically applied to PatentsView.org, providing a practical way to assess and compare disambiguation performance considering sampling biases.

Contribution

It introduces a data collection and performance estimation approach tailored for patent data, enabling reliable assessment of disambiguation algorithms.

Findings

01

First representative performance evaluation of PatentsView disambiguation

02

Method accounts for sampling biases in patent data

03

Facilitates comparison of different entity resolution algorithms

Abstract

This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Data Mining Algorithms and Applications · Privacy-Preserving Technologies in Data