Structure Learning for Directed Trees with Zero-Inflated Compositional Nodes
Shuangjie Zhang, Bani K. Mallick, Yang Ni

TL;DR
This paper presents a new method for learning directed tree structures among compositional data vectors, effectively handling zero-inflation and respecting simplex geometry, with proven consistency and practical applications in microbiome and single-cell data.
Contribution
It introduces a novel framework for directed tree structure learning over compositional nodes using KL divergence, ensuring identifiability and consistency.
Findings
Method accurately recovers true structures in simulations.
Applied to microbiome and single-cell data, revealing biologically meaningful directions.
Finite-sample guarantees relate sample size to signal strength and data dimensions.
Abstract
Compositional data, which are vectors of proportions constrained to the probability simplex, arise frequently in modern scientific applications, including microbiome relative abundances across body sites and cell-type mixture weights derived from single-cell genomics. While regression methods for compositional data are well developed, no existing graphical model framework addresses the problem of learning conditional dependence structures among multiple compositional vectors. This paper introduces a novel framework for directed tree structure learning over compositional nodes. We employ the Kullback-Leibler divergence as the scoring function and model the conditional expectation of each child composition as a mixture of a baseline composition and a parent-driven component parameterized by a column-stochastic transition matrix. This formulation respects the simplex geometry, handles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
