Collaborative causal inference with a distributed data-sharing management
Mengtong Hu, Xu Shi, and Peter X.-K. Song

TL;DR
This paper introduces a privacy-preserving distributed causal inference framework for multicenter clinical trials, enabling analysis without raw data sharing, thus overcoming data sharing barriers and maintaining statistical power.
Contribution
The paper proposes a novel distributed causal inference method that uses only summary statistics, avoiding raw data sharing and addressing data privacy and heterogeneity issues.
Findings
The method maintains statistical power close to centralized analysis.
It offers strong data privacy protections.
It performs well in simulations and real clinical trial data.
Abstract
Data sharing barriers are paramount challenges arising from multicenter clinical trials where multiple data sources are stored in a distributed fashion at different local study sites. Merging such data sources into a common data storage for a centralized statistical analysis requires a data use agreement, which is often time-consuming. Data merging may become more burdensome when causal inference is of primary interest because propensity score modeling involves combining many confounding variables, and systematic incorporation of this additional modeling in meta-analysis has not been thoroughly investigated in the literature. We propose a new causal inference framework that avoids the merging of subject-level raw data from multiple sites but needs only the sharing of summary statistics. The proposed collaborative inference enjoys maximal protection of data privacy and minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Privacy-Preserving Technologies in Data
