Data fusion using weakly aligned sources
Sijia Li, Peter B. Gilbert, Rui Duan, and Alex Luedtke

TL;DR
This paper presents a novel data fusion method that leverages weakly aligned data sources with known misalignment parameters to improve estimation efficiency in multi-source data integration tasks.
Contribution
It introduces a new approach to incorporate weakly aligned data sources into data fusion, extending beyond fully aligned sources and quantifying efficiency gains.
Findings
Quantifies efficiency improvements from weakly aligned sources.
Provides a semiparametric efficiency bound for the proposed method.
Demonstrates application in HIV vaccine trial data fusion.
Abstract
We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment is known up to finite-dimensional parameters. {We quantify the additional efficiency gains achieved through the integration of these weakly aligned sources. We characterize the semiparametric efficiency bound and provide a general means to construct estimators achieving these efficiency gains.} We illustrate our results by fusing data from two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlood groups and transfusion · Molecular Biology Techniques and Applications · Machine Learning and Algorithms
