Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org
Olivier Binette, Sokhna A York, Emma Hickerson, Youngsoo Baek, Sarvo, Madhavan, Christina Jones

TL;DR
This paper presents a new evaluation method for entity resolution algorithms, specifically applied to PatentsView.org, providing a practical way to assess and compare disambiguation performance considering sampling biases.
Contribution
It introduces a data collection and performance estimation approach tailored for patent data, enabling reliable assessment of disambiguation algorithms.
Findings
First representative performance evaluation of PatentsView disambiguation
Method accounts for sampling biases in patent data
Facilitates comparison of different entity resolution algorithms
Abstract
This paper introduces a novel evaluation methodology for entity resolution algorithms. It is motivated by PatentsView.org, a U.S. Patents and Trademarks Office patent data exploration tool that disambiguates patent inventors using an entity resolution algorithm. We provide a data collection methodology and tailored performance estimators that account for sampling biases. Our approach is simple, practical and principled -- key characteristics that allow us to paint the first representative picture of PatentsView's disambiguation performance. This approach is used to inform PatentsView's users of the reliability of the data and to allow the comparison of competing disambiguation algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Privacy-Preserving Technologies in Data
