Algorithms for Internal Validation Clustering Measures in the Post Genomic Era
Filippo Utro

TL;DR
This paper evaluates and compares internal validation measures for clustering microarray data, proposing a new algorithmic framework and approximation techniques to balance accuracy and computational efficiency.
Contribution
It introduces a general algorithmic paradigm for stability-based validation measures and develops fast approximation algorithms to improve their practical applicability.
Findings
Hierarchy of measures in terms of accuracy and speed
Fast approximation algorithms significantly reduce computation time
Trade-off between speed and accuracy is minimized with new techniques
Abstract
Inferring cluster structure in microarray datasets is a fundamental task for the -omic sciences. A fundamental question in Statistics, Data Analysis and Classification, is the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. In this dissertation, a study of internal validation measures is given, paying particular attention to the stability based ones. Indeed, this class of measures is particularly prominent and promising in order to have a reliable estimate the number of clusters in a dataset. For those measures, a new general algorithmic paradigm is proposed here that highlights the richness of measures in this class and accounts for the ones already available in the literature.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics
