Accuracy and Robustness of Clustering Algorithms for Small-Size Applications in Bioinformatics
Pamela Minicozzi, Fabio Rapallo, Enrico Scalas, Francesco Dondero

TL;DR
This paper evaluates the accuracy and robustness of various clustering algorithms on small datasets typical in bioinformatics, highlighting the impact of sample size and proposing a criterion for algorithm selection.
Contribution
It provides an analysis of clustering performance on small, noisy datasets and introduces an a posteriori criterion for choosing between discordant algorithms.
Findings
Error rates increase when observations are fewer than variables
Clustering accuracy diminishes with small sample sizes in microarray data
A criterion for selecting between conflicting clustering results is proposed
Abstract
The performance (accuracy and robustness) of several clustering algorithms is studied for linearly dependent random variables in the presence of noise. It turns out that the error percentage quickly increases when the number of observations is less than the number of variables. This situation is common situation in experiments with DNA microarrays. Moreover, an {\it a posteriori} criterion to choose between two discordant clustering algorithm is presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
