Distributed Design for Causal Inferences on Big Observational Data
Yumin Zhang, Arman Sabbaghi

TL;DR
This paper introduces a distributed framework for designing causal inference studies on large, complex observational datasets, enabling better covariate balance through parallel collaboration among multiple designers.
Contribution
It proposes a novel distributed design framework that handles high-dimensional, heterogeneous Big Data by dividing covariates among multiple designers for improved study design.
Findings
Framework effectively improves covariate balance in Big Data settings.
Simulation studies validate the framework's flexibility and power.
Application to real datasets demonstrates practical utility.
Abstract
A fundamental issue in causal inference for Big Observational Data is confounding due to covariate imbalances between treatment groups. This can be addressed by designing the data prior to analysis. Existing design methods, developed for traditional observational studies with single designers, can yield unsatisfactory designs with suboptimum covariate balance for Big Observational Data due to their inability to accommodate the massive dimensionality, heterogeneity, and volume of the Big Data. We propose a new framework for the distributed design of Big Observational Data amongst collaborative designers. Our framework first assigns subsets of the high-dimensional and heterogeneous covariates to multiple designers. The designers then summarize their covariates into lower-dimensional quantities, share their summaries with the others, and design the study in parallel based on their assigned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
