On linkage bias-correction for estimators using iterated bootstraps
Siu-Ming Tam, Min Wang, Alicia Rambaldi, Dehua Tao

TL;DR
This paper introduces a bootstrap-based method to correct linkage bias in estimators derived from linked datasets, along with a test to evaluate the benefit of increasing bootstrap iterations, demonstrated through simulated and real data.
Contribution
It proposes a novel bootstrap-based approach for linkage bias correction and a test to determine the optimal number of bootstrap iterations for improved accuracy.
Findings
The bootstrap bias-corrected estimators effectively reduce linkage bias.
The test helps identify when additional bootstrap iterations no longer improve accuracy.
Application to real and simulated data validates the methodology.
Abstract
By amalgamating data from disparate sources, the resulting integrated dataset becomes a valuable resource for statistical analysis. In probabilistic record linkage, the effectiveness of such integration relies on the availability of linkage variables free from errors. Where this is lacking, the linked data set would suffer from linkage errors and the resultant analyses, linkage bias. This paper proposes a methodology leveraging the bootstrap technique to devise linkage bias-corrected estimators. Additionally, it introduces a test to assess whether increasing the number of bootstrap iterations meaningfully reduces linkage bias or merely inflates variance without further improving accuracy. An application of these methodologies is demonstrated through the analysis of a simulated dataset featuring hormone information, along with a dataset obtained from linking two data sets from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Census and Population Estimation · Data Analysis and Archiving
