Efficient Estimation Under Data Fusion
Sijia Li, Alex Luedtke

TL;DR
This paper develops methods for efficiently estimating parameters by fusing multiple data sources, demonstrating significant efficiency gains through theoretical bounds and practical examples, especially in vaccine studies.
Contribution
It introduces a general framework for data fusion that reduces the semiparametric efficiency bound and constructs estimators achieving these bounds.
Findings
Marked efficiency improvements in numerical experiments.
Significant efficiency gains demonstrated in vaccine immunogenicity studies.
Theoretical characterization of efficiency bounds in data fusion contexts.
Abstract
We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods in Clinical Trials
