Bayesian Nonparametric Variable Selection as an Exploratory Tool for Finding Genes that Matter
Babak Shahbaba

TL;DR
This paper introduces a nonparametric Bayesian method for exploratory gene selection in large-scale genomic studies, effectively identifying relevant genes without prior hypotheses.
Contribution
It develops a novel random partition model that groups genes by relevance and assigns latent ranks, improving the exploration of important factors in high-dimensional data.
Findings
Effective in simulated data for gene relevance detection
Successfully applied to HCMV transcriptome data
Identifies differentially expressed genes in leukemia studies
Abstract
High-throughput scientific studies involving no clear a'priori hypothesis are common. For example, a large-scale genomic study of a disease may examine thousands of genes without hypothesizing that any specific gene is responsible for the disease. In these studies, the objective is to explore a large number of possible factors (e.g. genes) in order to identify a small number that will be considered in follow-up studies that tend to be more thorough and on smaller scales. For large-scale studies, we propose a nonparametric Bayesian approach based on random partition models. Our model thus divides the set of candidate factors into several subgroups according to their degrees of relevance, or potential effect, in relation to the outcome of interest. The model allows for a latent rank to be assigned to each factor according to the overall potential importance of its corresponding group. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock
