An empirical comparison and characterisation of nine popular clustering   methods

Christian Hennig

arXiv:2102.03645·stat.ME·February 9, 2021·Adv. Data Anal. Classif.·1 cites

An empirical comparison and characterisation of nine popular clustering methods

Christian Hennig

PDF

Open Access

TL;DR

This study empirically compares nine popular clustering methods across 42 datasets, analyzing their ability to recover true clusterings and characterizing their properties using various validation indexes.

Contribution

It provides a detailed characterization of clustering methods and relates cluster properties to their similarity with true clusterings, aiding method selection.

Findings

01

Methods vary in their ability to recover true clusterings.

02

Cluster properties influence similarity to true clusterings.

03

Insights into expected clustering properties from different methods.

Abstract

Nine popular clustering methods are applied to 42 real data sets. The aim is to give a detailed characterisation of the methods by means of several cluster validation indexes that measure various individual aspects of the resulting clusters such as small within-cluster distances, separation of clusters, closeness to a Gaussian distribution etc. as introduced in Hennig (2019). 30 of the data sets come with a "true" clustering. On these data sets the similarity of the clusterings from the nine methods to the "true" clusterings is explored. Furthermore, a mixed effects regression relates the observable individual aspects of the clusters to the similarity with the "true" clusterings, which in real clustering problems is unobservable. The study gives new insight not only into the ability of the methods to discover "true" clusterings, but also into properties of clusterings that can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Face and Expression Recognition