Interaction Models and Generalized Score Matching for Compositional Data
Shiqing Yu, Mathias Drton, Ali Shojaie

TL;DR
This paper introduces a new class of exponential family models for compositional data, such as microbiome data, that capture pairwise interactions and are supported on the probability simplex, with efficient estimation methods based on generalized score matching.
Contribution
It proposes a novel class of interaction models for compositional data supported on the simplex, including estimation techniques that handle the normalizing constant issue.
Findings
Models include Dirichlet and additive logistic normal distributions.
Estimation methods are effective even in high-dimensional settings.
The approach handles the simplex domain as efficiently as full-dimensional domains.
Abstract
Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is considerable interest in modeling interactions among such relative proportions. To this end we propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex. Special cases include the family of Dirichlet distributions as well as Aitchison's additive logistic normal distributions. Generally, the distributions we consider have a density that features a difficult to compute normalizing constant. To circumvent this issue, we design effective estimation methods based on generalized versions of score matching. A high-dimensional analysis of our estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Bayesian Methods and Mixture Models
