Integration of Individual Participant and Aggregate Data Under Dataset Shift: Summary Statistic Comparison and Scalable Computation
Ming-Yueh Huang, Jing Qin, Chiung-Yu Huang

TL;DR
This paper explores how different summary statistics from aggregate data affect the efficiency of integrating individual and aggregate data, especially under dataset shift, proposing methods for improved estimation and practical recommendations.
Contribution
It introduces a framework comparing summary statistics for data integration, highlights the benefits of outcome-stratified summaries, and develops scalable estimation methods under dataset shift.
Findings
Outcome-stratified summaries improve efficiency more than covariate-stratified ones.
Including outcome-stratified summaries for continuous outcomes enhances evidence synthesis.
A fast, non-iterative estimation procedure improves scalability and stability.
Abstract
Integrated IPD-AD analysis, which combines individual participant data (IPD) with aggregate data (AD), is increasingly recognized as an effective strategy for generating more reliable and generalizable inferences from heterogeneous studies. While most existing work has focused on algorithmic approaches, this paper investigates a complementary yet underexplored question: how different forms of AD influence the efficiency of data integration. Working within a constrained maximum likelihood estimation framework, we compare commonly reported summary statistics and show that subgroup-specific summaries can substantially improve estimation efficiency. In particular, we find that AD derived from outcome-stratified subgroups (e.g., cases and controls) consistently yield greater efficiency gains than those based on covariate-stratified subgroups (e.g., age or exposure categories), especially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
