Cross-Study Replicability in Cluster Analysis
Lorenzo Masoero, Emma Thomas, Giovanni Parmigiani, Svitlana, Tyekucheva, Lorenzo Trippa

TL;DR
This paper reviews methods for assessing the replicability of clustering results across multiple datasets, emphasizing their importance in cancer research for identifying consistent biological subtypes.
Contribution
It introduces a framework for evaluating cross-study clustering replicability applicable to any clustering algorithm and demonstrates its utility with experiments on gene expression data.
Findings
Replicability metrics can effectively assess consistent cluster identification across datasets.
The framework is versatile and applicable to various clustering methods.
Experiments show the approach's usefulness in real and synthetic data.
Abstract
In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their ability to consistently recover biologically meaningful clusters across several datasets. In this paper, we review existing methods to assess replicability of clustering analyses, and discuss a framework for evaluating cross-study clustering replicability, useful when two or more studies are available. These approaches can be applied to any clustering algorithm and can employ different measures of similarity between partitions to quantify replicability, globally (i.e. for the whole sample) as well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Advanced Clustering Algorithms Research
