Algorithms for Internal Validation Clustering Measures in the Post   Genomic Era

Filippo Utro

arXiv:1102.2915·cs.DS·February 16, 2011·5 cites

Algorithms for Internal Validation Clustering Measures in the Post Genomic Era

Filippo Utro

PDF

Open Access

TL;DR

This paper evaluates and compares internal validation measures for clustering microarray data, proposing a new algorithmic framework and approximation techniques to balance accuracy and computational efficiency.

Contribution

It introduces a general algorithmic paradigm for stability-based validation measures and develops fast approximation algorithms to improve their practical applicability.

Findings

01

Hierarchy of measures in terms of accuracy and speed

02

Fast approximation algorithms significantly reduce computation time

03

Trade-off between speed and accuracy is minimized with new techniques

Abstract

Inferring cluster structure in microarray datasets is a fundamental task for the -omic sciences. A fundamental question in Statistics, Data Analysis and Classification, is the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. In this dissertation, a study of internal validation measures is given, paying particular attention to the stability based ones. Indeed, this class of measures is particularly prominent and promising in order to have a reliable estimate the number of clusters in a dataset. For those measures, a new general algorithmic paradigm is proposed here that highlights the richness of measures in this class and accounts for the ones already available in the literature.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics