Validation of cluster analysis results on validation data: A systematic   framework

Theresa Ullmann; Christian Hennig; Anne-Laure Boulesteix

arXiv:2103.01281·stat.ME·January 11, 2022·WIREs Data Mining Knowl. Discov.·1 cites

Validation of cluster analysis results on validation data: A systematic framework

Theresa Ullmann, Christian Hennig, Anne-Laure Boulesteix

PDF

Open Access

TL;DR

This paper introduces a comprehensive framework for validating clustering results on validation datasets, integrating existing validation methods and formalizing different validation types for improved assessment.

Contribution

It provides a systematic, formalized framework that encompasses most existing validation techniques for clustering results on validation data.

Findings

01

Framework unifies validation approaches

02

Formal definitions clarify validation procedures

03

Examples demonstrate practical application

Abstract

Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To judge the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic structured framework for validating clustering results on validation data that includes most existing validation approaches. In particular, we review classical validation techniques such as internal and external validation, stability analysis, hypothesis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data-Driven Disease Surveillance · Data Mining Algorithms and Applications