Integrating overlapping datasets using bivariate causal discovery
Anish Dhir, Ciar\'an M. Lee

TL;DR
This paper introduces a new algorithm for integrating overlapping datasets to discover causal relations beyond conditional independence, outperforming previous methods on synthetic and real data.
Contribution
It adapts and extends bivariate causal discovery algorithms to handle multiple overlapping datasets, providing a sound and complete solution.
Findings
Outperforms previous approaches on synthetic data
Effective on real-world datasets
Handles overlapping variables in multiple datasets
Abstract
Causal knowledge is vital for effective reasoning in science, as causal relations, unlike correlations, allow one to reason about the outcomes of interventions. Algorithms that can discover causal relations from observational data are based on the assumption that all variables have been jointly measured in a single dataset. In many cases this assumption fails. Previous approaches to overcoming this shortcoming devised algorithms that returned all joint causal structures consistent with the conditional independence information contained in each individual dataset. But, as conditional independence tests only determine causal structure up to Markov equivalence, the number of consistent joint structures returned by these approaches can be quite large. The last decade has seen the development of elegant algorithms for discovering causal relations beyond conditional independence, which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
