Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis
Jun Chen, Hongzhe Li

TL;DR
This paper introduces a sparse Dirichlet-multinomial regression model with variable selection for microbiome data, effectively identifying environmental covariates associated with microbiome composition while accounting for overdispersion.
Contribution
The authors develop a penalized likelihood approach with a sparse group l1 penalty and an efficient algorithm for variable selection in high-dimensional microbiome data analysis.
Findings
Sparse DM regression outperforms traditional models in identifying relevant covariates.
The method successfully detects strong associations between nutrient intake and gut microbiome.
Simulations confirm improved variable selection accuracy.
Abstract
With the development of next generation sequencing technology, researchers have now been able to study the microbiome composition using direct sequencing, whose output are bacterial taxa counts for each microbiome sample. One goal of microbiome study is to associate the microbiome composition with environmental covariates. We propose to model the taxa counts using a Dirichlet-multinomial (DM) regression model in order to account for overdispersion of observed counts. The DM regression model can be used for testing the association between taxa composition and covariates using the likelihood ratio test. However, when the number of covariates is large, multiple testing can lead to loss of power. To address the high dimensionality of the problem, we develop a penalized likelihood approach to estimate the regression parameters and to select the variables by imposing a sparse group …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
