# Identification and Estimation of Causal Effects from Dependent Data

**Authors:** Eli Sherman, Ilya Shpitser

arXiv: 1902.01443 · 2019-02-06

## TL;DR

This paper develops a comprehensive framework for identifying and estimating causal effects in dependent data settings, such as social networks, where traditional iid assumptions do not hold, enabling causal inference in complex, interconnected systems.

## Contribution

It introduces a general theory and complete algorithm for non-parametric causal identification using segregated graphs in dependent data scenarios.

## Key findings

- Successful application to synthetic social network data
- Ability to identify causal effects with dependent observations
- Framework accommodates unobserved confounding and data dependence

## Abstract

The assumption that data samples are independent and identically distributed (iid) is standard in many areas of statistics and machine learning. Nevertheless, in some settings, such as social networks, infectious disease modeling, and reasoning with spatial and temporal data, this assumption is false. An extensive literature exists on making causal inferences under the iid assumption [18, 12, 28, 22], even when unobserved confounding bias may be present. But, as pointed out in [20], causal inference in non-iid contexts is challenging due to the presence of both unobserved confounding and data dependence. In this paper we develop a general theory describing when causal inferences are possible in such scenarios. We use segregated graphs [21], a generalization of latent projection mixed graphs [30], to represent causal models of this type and provide a complete algorithm for non-parametric identification in these models. We then demonstrate how statistical inference may be performed on causal parameters identified by this algorithm. In particular, we consider cases where only a single sample is available for parts of the model due to full interference, i.e., all units are pathwise dependent and neighbors' treatments affect each others' outcomes [26]. We apply these techniques to a synthetic data set which considers users sharing fake news articles given the structure of their social network, user activity levels, and baseline demographics and socioeconomic covariates.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.01443/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.01443/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1902.01443/full.md

---
Source: https://tomesphere.com/paper/1902.01443