Bayesian Variable Selection for Probit Mixed Models Applied to Gene Selection
Meili Baragatti (IML)

TL;DR
This paper introduces a Bayesian variable selection method for probit mixed models, effectively identifying relevant genes in large, merged gene expression datasets with complex hierarchical structures.
Contribution
It extends existing Bayesian variable selection techniques to handle probit mixed models with dataset as a random effect, suitable for high-dimensional biological data.
Findings
Method efficiently selects relevant genes in large datasets.
Successfully applied to breast cancer gene expression data.
Identifies genes associated with estrogen receptor status.
Abstract
In computational biology, gene expression datasets are characterized by very few individual samples compared to a large number of measurements per sample. Thus, it is appealing to merge these datasets in order to increase the number of observations and diversify the data, allowing a more reliable selection of genes relevant to the biological problem. Besides, the increased size of a merged dataset facilitates its re-splitting into training and validation sets. This necessitates the introduction of the dataset as a random effect. In this context, extending a work of Lee et al. (2003), a method is proposed to select relevant variables among tens of thousands in a probit mixed regression model, considered as part of a larger hierarchical Bayesian model. Latent variables are used to identify subsets of selected variables and the grouping (or blocking) technique of Liu (1994) is combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Bayesian Methods and Mixture Models
