# Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion

**Authors:** Eric V. Strobl, Shyam Visweswaran, Peter L. Spirtes

arXiv: 1705.09031 · 2017-05-26

## TL;DR

This paper introduces a test-wise deletion method for causal inference in datasets with non-random missing data, which improves efficiency and accuracy over traditional list-wise deletion by selectively deleting samples per test.

## Contribution

The paper proposes and validates a theoretically sound test-wise deletion approach that enhances causal discovery in MNAR datasets by saving more data during each test.

## Key findings

- Test-wise deletion outperforms list-wise deletion in synthetic data.
- Test-wise deletion improves causal discovery accuracy on real datasets.
- The method maintains soundness under specific missingness assumptions.

## Abstract

Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values. In this report, we show that we can more efficiently utilize the observed values with test-wise deletion while still maintaining algorithmic soundness. Here, test-wise deletion refers to the process of list-wise deleting samples only among the variables required for each conditional independence (CI) test used in constraint-based searches. Test-wise deletion therefore often saves more samples than list-wise deletion for each CI test, especially when we have a sparse underlying graph. Our theoretical results show that test-wise deletion is sound under the justifiable assumption that none of the missingness mechanisms causally affect each other in the underlying causal graph. We also find that FCI and RFCI with test-wise deletion outperform their list-wise deletion and imputation counterparts on average when MNAR holds in both synthetic and real data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.09031/full.md

## Figures

34 figures with captions in the complete paper: https://tomesphere.com/paper/1705.09031/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1705.09031/full.md

---
Source: https://tomesphere.com/paper/1705.09031