Integrating overlapping datasets using bivariate causal discovery

Anish Dhir; Ciar\'an M. Lee

arXiv:1910.11356·stat.ML·November 12, 2019

Integrating overlapping datasets using bivariate causal discovery

Anish Dhir, Ciar\'an M. Lee

PDF

TL;DR

This paper introduces a new algorithm for integrating overlapping datasets to discover causal relations beyond conditional independence, outperforming previous methods on synthetic and real data.

Contribution

It adapts and extends bivariate causal discovery algorithms to handle multiple overlapping datasets, providing a sound and complete solution.

Findings

01

Outperforms previous approaches on synthetic data

02

Effective on real-world datasets

03

Handles overlapping variables in multiple datasets

Abstract

Causal knowledge is vital for effective reasoning in science, as causal relations, unlike correlations, allow one to reason about the outcomes of interventions. Algorithms that can discover causal relations from observational data are based on the assumption that all variables have been jointly measured in a single dataset. In many cases this assumption fails. Previous approaches to overcoming this shortcoming devised algorithms that returned all joint causal structures consistent with the conditional independence information contained in each individual dataset. But, as conditional independence tests only determine causal structure up to Markov equivalence, the number of consistent joint structures returned by these approaches can be quite large. The last decade has seen the development of elegant algorithms for discovering causal relations beyond conditional independence, which can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.