Estimation in exponential family Regression based on linked data contaminated by mismatch error
Zhenbang Wang, Emanuel Ben-David, Martin Slawski

TL;DR
This paper develops a novel estimation method for exponential family regression models using linked data with mismatches, employing offsets and penalization to improve accuracy despite linkage errors.
Contribution
It introduces a new approach with offsets and $\
Findings
Significant improvement over existing methods in both synthetic and real data.
The proposed method effectively accounts for linkage mismatches.
Theoretical conditions for correct data matching are established.
Abstract
Identification of matching records in multiple files can be a challenging and error-prone task. Linkage error can considerably affect subsequent statistical analysis based on the resulting linked file. Several recent papers have studied post-linkage linear regression analysis with the response variable in one file and the covariates in a second file from the perspective of the "Broken Sample Problem" and "Permuted Data". In this paper, we present an extension of this line of research to exponential family response given the assumption of a small to moderate number of mismatches. A method based on observation-specific offsets to account for potential mismatches and -penalization is proposed, and its statistical properties are discussed. We also present sufficient conditions for the recovery of the correct correspondence between covariates and responses if the regression parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Bayesian Modeling and Causal Inference
