Accuracy and Robustness of Clustering Algorithms for Small-Size   Applications in Bioinformatics

Pamela Minicozzi; Fabio Rapallo; Enrico Scalas; Francesco Dondero

arXiv:0807.3838·stat.AP·November 13, 2009

Accuracy and Robustness of Clustering Algorithms for Small-Size Applications in Bioinformatics

Pamela Minicozzi, Fabio Rapallo, Enrico Scalas, Francesco Dondero

PDF

TL;DR

This paper evaluates the accuracy and robustness of various clustering algorithms on small datasets typical in bioinformatics, highlighting the impact of sample size and proposing a criterion for algorithm selection.

Contribution

It provides an analysis of clustering performance on small, noisy datasets and introduces an a posteriori criterion for choosing between discordant algorithms.

Findings

01

Error rates increase when observations are fewer than variables

02

Clustering accuracy diminishes with small sample sizes in microarray data

03

A criterion for selecting between conflicting clustering results is proposed

Abstract

The performance (accuracy and robustness) of several clustering algorithms is studied for linearly dependent random variables in the presence of noise. It turns out that the error percentage quickly increases when the number of observations is less than the number of variables. This situation is common situation in experiments with DNA microarrays. Moreover, an {\it a posteriori} criterion to choose between two discordant clustering algorithm is presented.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.