Relaxing the Assumption of Strongly Non-Informative Linkage Error in Secondary Regression Analysis of Linked Files

Priyanjali Bukke; Martin Slawski

arXiv:2510.17553·stat.ME·October 21, 2025

Relaxing the Assumption of Strongly Non-Informative Linkage Error in Secondary Regression Analysis of Linked Files

Priyanjali Bukke, Martin Slawski

PDF

Open Access

TL;DR

This paper extends a regression framework for linked data by relaxing the assumption that linkage errors are non-informative, improving validity in practical scenarios with limited linkage information.

Contribution

It introduces an extension to an existing model that accounts for informative linkage errors, addressing a key limitation in secondary analysis of linked files.

Findings

01

Simulation results demonstrate improved inference accuracy.

02

Case study confirms practical applicability.

03

Extension effectively handles informative linkage errors.

Abstract

Data analysis of files that are a result of linking records from multiple sources are often affected by linkage errors. Records may be linked incorrectly, or their links may be missed. In consequence, it is essential that such errors are taken into account to ensure valid post-linkage inference. Here, we propose an extension to a general framework for regression with linked covariates and responses based on a two-component mixture model, which was developed in prior work. This framework addresses the challenging case of secondary analysis in which only the linked data is available and information about the record linkage process is limited. The extension considered herein relaxes the assumption of strongly non-informative linkage in the framework according to which linkage does not depend on the covariates used in the analysis, which may be limiting in practice. The effectiveness of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Authorship Attribution and Profiling