Prediction analysis for microbiome sequencing data
Tao Wang, Can Yang, Hongyu Zhao

TL;DR
This paper introduces PAMIR, a new statistical framework for predicting host traits from microbiome sequencing data, effectively handling challenges like zero inflation, over-dispersion, and varying library sizes.
Contribution
The paper presents PAMIR, an inverse regression-based method with a novel dimension-reduction approach and an efficient EM algorithm for microbiome-based trait prediction.
Findings
PAMIR outperforms existing methods in simulations.
It effectively handles zero-inflation and over-dispersion.
Demonstrated success on real microbiome data.
Abstract
One primary goal of human microbiome studies is to predict host traits based on human microbiota. However, microbial community sequencing data present significant challenges to the development of statistical methods. In particular, the samples have different library sizes, the data contain many zeros and are often over-dispersed. To address these challenges, we introduce a new statistical framework, called predictive analysis in metagenomics via inverse regression (PAMIR). An inverse regression model is developed for over-dispersed microbiota counts given the trait, and then a prediction rule is constructed by taking advantage of the dimension-reduction structure in the model. An efficient Monte Carlo expectation-maximization algorithm is designed for carrying out maximum likelihood estimation. We demonstrate the advantages of PAMIR through simulations and a real data example.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGut microbiota and health · Gene expression and cancer classification · Metabolomics and Mass Spectrometry Studies
