Direct covariance matrix estimation with compositional data
Aaron J. Molstad, Karl Oskar Ekvall, Piotr M. Suder

TL;DR
This paper introduces a direct, convex-optimization-based estimator for covariance matrices of latent log-abundances in compositional data, improving estimation accuracy especially in high-dimensional microbiome studies.
Contribution
It proposes a novel direct covariance matrix estimator for compositional data that shares information across populations and guarantees positive definiteness.
Findings
Performs well in high-dimensional settings.
Outperforms existing estimators in simulations.
Provides more reliable microbiome data analysis.
Abstract
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject's gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Oral microbiology and periodontitis research
