Causal Discovery in the Presence of Missing Data
Ruibo Tu, Kun Zhang, Paul Ackermann, Bo Christer Bertilson, Clark, Glymour, Hedvig Kjellstr\"om, Cheng Zhang

TL;DR
This paper introduces the MVPC algorithm, a causal discovery method that accurately infers causal structures from data with various missingness mechanisms, including MNAR, by extending the PC algorithm with correction techniques.
Contribution
We develop the MVPC algorithm that corrects for missing data mechanisms, enabling accurate causal discovery under MAR and MNAR conditions, which was not addressed by existing methods.
Findings
MVPC achieves asymptotically correct causal inference on MAR and MNAR data.
Experimental results show MVPC outperforms benchmark methods on synthetic and real data.
MVPC effectively discovers causal relations in complex missing data scenarios.
Abstract
Missing data are ubiquitous in many domains including healthcare. When these data entries are not missing completely at random, the (conditional) independence relations in the observed data may be different from those in the complete data generated by the underlying causal process. Consequently, simply applying existing causal discovery methods to the observed data may lead to wrong conclusions. In this paper, we aim at developing a causal discovery method to recover the underlying causal structure from observed data that follow different missingness mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). With missingness mechanisms represented by missingness graphs, we analyse conditions under which additional correction is needed to derive conditional independence/dependence relations in the complete data. Based on our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management
Methodspc
