Prediction analysis for microbiome sequencing data

Tao Wang; Can Yang; Hongyu Zhao

arXiv:1710.02616·stat.ME·October 10, 2017

Prediction analysis for microbiome sequencing data

Tao Wang, Can Yang, Hongyu Zhao

PDF

Open Access

TL;DR

This paper introduces PAMIR, a new statistical framework for predicting host traits from microbiome sequencing data, effectively handling challenges like zero inflation, over-dispersion, and varying library sizes.

Contribution

The paper presents PAMIR, an inverse regression-based method with a novel dimension-reduction approach and an efficient EM algorithm for microbiome-based trait prediction.

Findings

01

PAMIR outperforms existing methods in simulations.

02

It effectively handles zero-inflation and over-dispersion.

03

Demonstrated success on real microbiome data.

Abstract

One primary goal of human microbiome studies is to predict host traits based on human microbiota. However, microbial community sequencing data present significant challenges to the development of statistical methods. In particular, the samples have different library sizes, the data contain many zeros and are often over-dispersed. To address these challenges, we introduce a new statistical framework, called predictive analysis in metagenomics via inverse regression (PAMIR). An inverse regression model is developed for over-dispersed microbiota counts given the trait, and then a prediction rule is constructed by taking advantage of the dimension-reduction structure in the model. An efficient Monte Carlo expectation-maximization algorithm is designed for carrying out maximum likelihood estimation. We demonstrate the advantages of PAMIR through simulations and a real data example.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGut microbiota and health · Gene expression and cancer classification · Metabolomics and Mass Spectrometry Studies