Relaxing the Assumption of Strongly Non-Informative Linkage Error in Secondary Regression Analysis of Linked Files
Priyanjali Bukke, Martin Slawski

TL;DR
This paper extends a regression framework for linked data by relaxing the assumption that linkage errors are non-informative, improving validity in practical scenarios with limited linkage information.
Contribution
It introduces an extension to an existing model that accounts for informative linkage errors, addressing a key limitation in secondary analysis of linked files.
Findings
Simulation results demonstrate improved inference accuracy.
Case study confirms practical applicability.
Extension effectively handles informative linkage errors.
Abstract
Data analysis of files that are a result of linking records from multiple sources are often affected by linkage errors. Records may be linked incorrectly, or their links may be missed. In consequence, it is essential that such errors are taken into account to ensure valid post-linkage inference. Here, we propose an extension to a general framework for regression with linked covariates and responses based on a two-component mixture model, which was developed in prior work. This framework addresses the challenging case of secondary analysis in which only the linked data is available and information about the record linkage process is limited. The extension considered herein relaxes the assumption of strongly non-informative linkage in the framework according to which linkage does not depend on the covariates used in the analysis, which may be limiting in practice. The effectiveness of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Authorship Attribution and Profiling
