Data Integration in Causal Inference
Xu Shi, Ziyang Pan, and Wang Miao

TL;DR
This paper reviews recent methods for integrating multiple heterogeneous datasets to improve causal inference, covering techniques like combining randomized trials with observational data, privacy-preserving distributed data analysis, and Bayesian approaches.
Contribution
It provides a comprehensive overview of recent advances in causal inference methods for data integration across diverse sources and study designs.
Findings
Summarizes methods for combining randomized and observational data.
Discusses privacy-preserving distributed data analysis techniques.
Highlights advances in Bayesian causal inference and causal discovery.
Abstract
Integrating data from multiple heterogeneous sources has become increasingly popular to achieve a large sample size and diverse study population. This paper reviews development in causal inference methods that combines multiple datasets collected by potentially different designs from potentially heterogeneous populations. We summarize recent advances on combining randomized clinical trial with external information from observational studies or historical controls, combining samples when no single sample has all relevant variables with application to two-sample Mendelian randomization, distributed data setting under privacy concerns for comparative effectiveness and safety research using real-world data, Bayesian causal inference, and causal discovery methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods in Clinical Trials · Genetic Associations and Epidemiology
