Validation of cluster analysis results on validation data: A systematic framework
Theresa Ullmann, Christian Hennig, Anne-Laure Boulesteix

TL;DR
This paper introduces a comprehensive framework for validating clustering results on validation datasets, integrating existing validation methods and formalizing different validation types for improved assessment.
Contribution
It provides a systematic, formalized framework that encompasses most existing validation techniques for clustering results on validation data.
Findings
Framework unifies validation approaches
Formal definitions clarify validation procedures
Examples demonstrate practical application
Abstract
Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To judge the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic structured framework for validating clustering results on validation data that includes most existing validation approaches. In particular, we review classical validation techniques such as internal and external validation, stability analysis, hypothesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data-Driven Disease Surveillance · Data Mining Algorithms and Applications
