Semiparametric Efficient Fusion of Individual Data and Summary Statistics
Wenjie Hu, Ruoyu Wang, Wei Li, Wang Miao

TL;DR
This paper develops a semiparametric framework for optimally combining individual data with external summary statistics, achieving efficiency gains while addressing potential bias through adaptive methods.
Contribution
It introduces a semiparametric efficiency bound and proposes a data-fused estimator with adaptive bias correction for integrating diverse data sources.
Findings
The proposed estimator attains the efficiency bound under weak assumptions.
Simulation studies show improved estimation accuracy.
Application to infection data demonstrates practical utility.
Abstract
Suppose we have individual data from an internal study and various summary statistics from relevant external studies. External summary statistics have the potential to improve statistical inference for the internal population; however, it may lead to efficiency loss or bias if not used properly. We study the fusion of individual data and summary statistics in a semiparametric framework to investigate the efficient use of external summary statistics. Under a weak transportability assumption, we establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is no larger than that using only internal data and underpins the potential efficiency gain of integrating individual data and summary statistics. We propose a data-fused efficient estimator that achieves this efficiency bound. In addition, an adaptive fusion estimator is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
