Regression Analysis After Bipartite Bayesian Record Linkage
Xueyan Hu, Jerome P. Reiter

TL;DR
This paper introduces a Bayesian multiple imputation framework for record linkage that accounts for linkage uncertainty and leverages study variables to improve regression analysis accuracy.
Contribution
It proposes a novel integrated approach combining bipartite Bayesian record linkage with regression modeling, addressing limitations of traditional two-stage methods.
Findings
More accurate regression parameter estimates compared to two-stage approaches
Effective use of study variables to distinguish true and false links
Demonstrated improvements through simulation studies and real data application
Abstract
In many settings, a data curator links records from two files to produce datasets that are shared with secondary analysts. Analysts use the linked files to estimate models of interest, such as regressions. Such two-stage approaches do not necessarily account for uncertainty in model parameters that results from uncertainty in the linkages. Further, they do not leverage the relationships among the study variables in the two files to help determine the linkages. We propose a multiple imputation framework to address these shortcomings. First, we use a bipartite Bayesian record linkage model to generate multiple plausible linked datasets, disregarding the information in the study variables. Second, we presume each linked file has a mixture of true links and false links. We estimate the mixture model using information from the study variables. Through simulation studies under a regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Analysis and Archiving · Census and Population Estimation
