Robust model-based clustering with gene ranking
Alberto Cozzini, Ajay Jasra, Giovanni Montana

TL;DR
This paper introduces a robust clustering method using penalized Student's t mixtures combined with bootstrap-based gene ranking, effectively handling noise and outliers in gene expression data to identify meaningful biological subgroups.
Contribution
It proposes a novel penalized Student's t mixture model with a bootstrap procedure for gene ranking, improving robustness and interpretability in gene expression clustering.
Findings
Performs well with outliers and heavy-tailed distributions
Accurately identifies informative genes with high sensitivity
Enhances model selection accuracy
Abstract
Cluster analysis of biological samples using gene expression measurements is a common task which aids the discovery of heterogeneous biological sub-populations having distinct mRNA profiles. Several model-based clustering algorithms have been proposed in which the distribution of gene expression values within each sub-group is assumed to be Gaussian. In the presence of noise and extreme observations, a mixture of Gaussian densities may over-fit and overestimate the true number of clusters. Moreover, commonly used model-based clustering algorithms do not generally provide a mechanism to quantify the relative contribution of each gene to the final partitioning of the data. We propose a penalised mixture of Student's t distributions for model-based clustering and gene ranking. Together with a bootstrap procedure, the proposed approach provides a means for ranking genes according to their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Bayesian Methods and Mixture Models
