A Statistical Perspective on the Challenges in Molecular Microbial Biology
Pratheepa Jeganathan, Susan P. Holmes

TL;DR
This paper discusses the statistical challenges in analyzing high throughput sequencing data for microbial communities and reviews tools and methods to address issues like contamination, batch effects, and heteroscedasticity.
Contribution
It introduces statistical tools such as hierarchical mixture and topic models, and reviews nonparametric Bayesian approaches for microbial data analysis.
Findings
Standard methods can facilitate inference on microbial communities.
Bayesian approaches help visualize and quantify uncertainty.
Addressing statistical challenges improves microbial data interpretation.
Abstract
High throughput sequencing (HTS)-based technology enables identifying and quantifying non-culturable microbial organisms in all environments. Microbial sequences have enhanced our understanding of the human microbiome, the soil and plant environment, and the marine environment. All molecular microbial data pose statistical challenges due to contamination sequences from reagents, batch effects, unequal sampling, and undetected taxa. Technical biases and heteroscedasticity have the strongest effects, but different strains across subjects and environments also make direct differential abundance testing unwieldy. We provide an introduction to a few statistical tools that can overcome some of these difficulties and demonstrate those tools on an example. We show how standard statistical methods, such as simple hierarchical mixture and topic models, can facilitate inferences on latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
