Bayesian Record Linkage with Variables in One File
Gauri Kamat, Mingyang Shan, and Roee Gutman

TL;DR
This paper introduces an extended Bayesian record linkage method that incorporates variables unique to each file, improving linkage accuracy and inference in multi-file data scenarios, especially relevant in healthcare and social sciences.
Contribution
The paper extends existing Bayesian record linkage techniques to include associations between variables exclusive to each file, enhancing linkage accuracy and inference.
Findings
Method improves linkage accuracy in simulations.
Incorporating exclusive variables yields more precise inferences.
Application to real data demonstrates practical utility.
Abstract
In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that this method can improve the linking process, and can yield accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Census and Population Estimation
