Bayesian Estimation of Bipartite Matchings for Record Linkage

Mauricio Sadinle

arXiv:1601.06630·stat.ME·January 26, 2016

Bayesian Estimation of Bipartite Matchings for Record Linkage

Mauricio Sadinle

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Bayesian approach to bipartite record linkage that models the entire matching as a parameter, allowing for uncertainty quantification and improved performance over traditional methods.

Contribution

It proposes a Bayesian framework for bipartite record linkage that accounts for uncertainty and introduces partial Bayes estimates, outperforming classical methods.

Findings

01

Bayesian method outperforms traditional techniques in challenging scenarios.

02

Uncertainty quantification improves decision-making in record linkage.

03

Partial Bayes estimates allow unresolved matches, enhancing flexibility.

Abstract

The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msadinle/BRL
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Advanced Database Systems and Queries · Privacy-Preserving Technologies in Data