A Novel Approach for Data Integration with Multiple Heterogeneous Data Sources
Farimah Shamsi, Andriy Derkach

TL;DR
This paper introduces a new statistical framework for integrating summary-level data with heterogeneous data sources using auxiliary information, improving data analysis in diverse sampling scenarios.
Contribution
It develops a novel method to incorporate auxiliary information for integrating heterogeneous data sources, relaxing the need for random sampling assumptions.
Findings
Method performs well in simulations under various sampling designs.
Application to cancer registry data demonstrates practical utility.
Achieves unbiased parameter estimates with heterogeneous data sources.
Abstract
The integration of data from multiple sources is increasingly used to achieve larger sample sizes and enhance population diversity. Our previous work established that, under random sampling from the same underlying population, integrating large incomplete datasets with summary-level data produces unbiased parameter estimates. In this study, we develop a novel statistical framework that enables the integration of summary-level data with information from heterogeneous data sources by leveraging auxiliary information. The proposed approach estimates study-specific sampling weights using this auxiliary information and calibrates the estimating equations to obtain the full set of model parameters. We evaluate the performance of the proposed method through simulation studies under various sampling designs and illustrate its application by reanalyzing U.S. cancer registry data combined with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Global Cancer Incidence and Screening · Colorectal Cancer Screening and Detection
