Directed Graphical Models and Causal Discovery for Zero-Inflated Data
Shiqing Yu, Mathias Drton, Ali Shojaie

TL;DR
This paper introduces a novel directed graphical modeling approach tailored for zero-inflated single-cell RNA sequencing data, enabling accurate inference of gene regulatory networks despite data sparsity.
Contribution
It develops a new zero-inflated model based on Hurdle distributions that can identify the exact causal graph under weak assumptions, with practical graph recovery methods.
Findings
Successfully applied to real T helper cell data
Validated identifiability through simulations
Achieved accurate graph estimation in zero-inflated context
Abstract
Modern RNA sequencing technologies provide gene expression measurements from single cells that promise refined insights on regulatory relationships among genes. Directed graphical models are well-suited to explore such (cause-effect) relationships. However, statistical analyses of single cell data are complicated by the fact that the data often show zero-inflated expression patterns. To address this challenge, we propose directed graphical models that are based on Hurdle conditional distributions parametrized in terms of polynomials in parent variables and their 0/1 indicators of being zero or nonzero. While directed graphs for Gaussian models are only identifiable up to an equivalence class in general, we show that, under a natural and weak assumption, the exact directed acyclic graph of our zero-inflated models can be identified. We propose methods for graph recovery, apply our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Single-cell and spatial transcriptomics · Gene Regulatory Network Analysis
