Data integration using covariate summaries from external sources
Facheng Yu, Zhen Qi, Yuqian Zhang

TL;DR
This paper introduces new data integration methods that use only external summary statistics to improve analysis robustness and extend causal inference capabilities across heterogeneous datasets.
Contribution
The authors develop novel techniques for data integration and causal inference that rely solely on external covariate summaries, avoiding the need for individual-level data.
Findings
Effective estimators constructed from summary statistics
Applicable to both homogeneous and heterogeneous data
Extended framework for causal inference and treatment effect estimation
Abstract
In modern data analysis, information is frequently collected from multiple sources, often leading to challenges such as data heterogeneity and imbalanced sample sizes across datasets. Robust and efficient data integration methods are crucial for improving the generalization and transportability of statistical findings. In this work, we address scenarios where, in addition to having full access to individualized data from a primary source, supplementary covariate information from external sources is also available. While traditional data integration methods typically require individualized covariates from external sources, such requirements can be impractical due to limitations related to accessibility, privacy, storage, and cost. Instead, we propose novel data integration techniques that rely solely on external summary statistics, such as sample means and covariances, to construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries
