Data fusion using weakly aligned sources

Sijia Li; Peter B. Gilbert; Rui Duan; and Alex Luedtke

arXiv:2308.14836·stat.ME·April 30, 2025·1 cites

Data fusion using weakly aligned sources

Sijia Li, Peter B. Gilbert, Rui Duan, and Alex Luedtke

PDF

Open Access

TL;DR

This paper presents a novel data fusion method that leverages weakly aligned data sources with known misalignment parameters to improve estimation efficiency in multi-source data integration tasks.

Contribution

It introduces a new approach to incorporate weakly aligned data sources into data fusion, extending beyond fully aligned sources and quantifying efficiency gains.

Findings

01

Quantifies efficiency improvements from weakly aligned sources.

02

Provides a semiparametric efficiency bound for the proposed method.

03

Demonstrates application in HIV vaccine trial data fusion.

Abstract

We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment is known up to finite-dimensional parameters. {We quantify the additional efficiency gains achieved through the integration of these weakly aligned sources. We characterize the semiparametric efficiency bound and provide a general means to construct estimators achieving these efficiency gains.} We illustrate our results by fusing data from two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlood groups and transfusion · Molecular Biology Techniques and Applications · Machine Learning and Algorithms