Clustering and Pruning in Causal Data Fusion
Otto Tabell, Santtu Tikka, Juha Karvanen

TL;DR
This paper introduces pruning and clustering techniques as preprocessing steps to simplify causal data fusion models, making do-calculus-based identification more computationally feasible in complex scenarios.
Contribution
It generalizes earlier results to multiple data sources and provides conditions for model reduction while preserving causal effect identifiability.
Findings
Pruning and clustering can reduce model complexity without losing causal information.
Conditions are derived for applying these techniques in multi-source causal graphs.
Examples demonstrate practical utility in epidemiology and social science.
Abstract
Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increases and the causal graph grows in complexity. Consequently, there exists a need to reduce the size of such models while preserving the essential features. For this purpose, we propose pruning (removing unnecessary variables) and clustering (combining variables) as preprocessing operations for causal data fusion. We generalize earlier results on a single data source and derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Time Series Analysis and Forecasting · Bayesian Modeling and Causal Inference
MethodsPruning
