Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data
Ellen Graham (1), Marco Carone (1), Andrea Rotnitzky (1) ((1) University of Washington)

TL;DR
This paper extends a unifying theory for semiparametric data fusion to include complex scenarios like two-sample IV analysis and epidemiological data integration, enabling more flexible and efficient inference from diverse data sources.
Contribution
It introduces a generalized framework for data fusion that accommodates non-factorized conditional distributions, broadening applicability to real-world complex data integration problems.
Findings
Provides universal influence function characterizations for complex data fusion scenarios.
Enables machine-learning debiased, semiparametric efficient estimation.
Extends existing theories to more general data alignment conditions.
Abstract
We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint target distribution. While this theory proves effective in many significant contexts, it falls short in certain common data fusion problems, such as two-sample instrumental variable analysis, settings that integrate data from epidemiological studies with diverse designs (e.g., prospective cohorts and retrospective case-control studies), and studies with variables prone to measurement error that are supplemented by validation studies. In this paper, we extend the aforementioned comprehensive theory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
MethodsALIGN
