Robust model-based clustering with gene ranking

Alberto Cozzini; Ajay Jasra; Giovanni Montana

arXiv:1201.5687·stat.ME·January 30, 2012·2 cites

Robust model-based clustering with gene ranking

Alberto Cozzini, Ajay Jasra, Giovanni Montana

PDF

Open Access

TL;DR

This paper introduces a robust clustering method using penalized Student's t mixtures combined with bootstrap-based gene ranking, effectively handling noise and outliers in gene expression data to identify meaningful biological subgroups.

Contribution

It proposes a novel penalized Student's t mixture model with a bootstrap procedure for gene ranking, improving robustness and interpretability in gene expression clustering.

Findings

01

Performs well with outliers and heavy-tailed distributions

02

Accurately identifies informative genes with high sensitivity

03

Enhances model selection accuracy

Abstract

Cluster analysis of biological samples using gene expression measurements is a common task which aids the discovery of heterogeneous biological sub-populations having distinct mRNA profiles. Several model-based clustering algorithms have been proposed in which the distribution of gene expression values within each sub-group is assumed to be Gaussian. In the presence of noise and extreme observations, a mixture of Gaussian densities may over-fit and overestimate the true number of clusters. Moreover, commonly used model-based clustering algorithms do not generally provide a mechanism to quantify the relative contribution of each gene to the final partitioning of the data. We propose a penalised mixture of Student's t distributions for model-based clustering and gene ranking. Together with a bootstrap procedure, the proposed approach provides a means for ranking genes according to their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Bayesian Methods and Mixture Models