Data-Adaptive Integration With Summary Data
Kosuke Morikawa, Sho Komukai, Satoshi Hattori

TL;DR
This paper introduces a flexible, doubly robust method for integrating internal individual-level data with external summary statistics, improving estimation accuracy while controlling bias under heterogeneity.
Contribution
It develops a generalized entropy-balancing approach that calibrates external data to the internal distribution, with data-adaptive selection and diagnostics for reliable integration.
Findings
Achieves significant efficiency gains in simulations and real data.
Maintains bias control under distributional heterogeneity.
Provides an R package for implementation.
Abstract
Combining an internal individual-level study with readily available external summary statistics promises major efficiency gains at minimal additional cost, yet heterogeneity between sources can bias estimates for the internal target population. We develop a generalized entropy-balancing integration strategy that calibrates external moments to the internal covariate distribution, explicitly permitting a biased external sample. Our estimator of the internal-population mean is doubly robust: it remains consistent when either the outcome-regression model or the entropy-balancing modelis correctly specified. When multiple balancing specifications are plausible, we introduce a data-adaptive selection rule. We also provide easy-to-compute, fully estimable diagnostics-based on the Mahalanobis distance and the Pearson chi-square divergence-that pinpoint when integration is guaranteed to strictly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
