# An Aposteriorical Clusterability Criterion for $k$-Means++ and   Simplicity of Clustering

**Authors:** Mieczys{\l}aw A. K{\l}opotek

arXiv: 1704.07139 · 2020-04-07

## TL;DR

This paper introduces a new a posteriori criterion for assessing the clusterability of data sets in $k$-means clustering, enabling efficient validation of clustering quality after algorithm execution.

## Contribution

It proposes a novel clusterability check that is computationally feasible and does not require identifying the optimal clustering, unlike previous methods.

## Key findings

- The criterion can be applied after running $k$-means to verify clusterability.
- If $k$-means++ fails to find a well-clusterable clustering, the data is likely not well-clusterable.
- The check has polynomial complexity, making it practical for real-world data sets.

## Abstract

We define the notion of a well-clusterable data set combining the point of view of the objective of $k$-means clustering algorithm (minimising the centric spread of data elements) and common sense (clusters shall be separated by gaps). We identify conditions under which the optimum of $k$-means objective coincides with a clustering under which the data is separated by predefined gaps.   We investigate two cases: when the whole clusters are separated by some gap and when only the cores of the clusters meet some separation condition.   We overcome a major obstacle in using clusterability criteria due to the fact that known approaches to clusterability checking had the disadvantage that they are related to the optimal clustering which is NP hard to identify.   Compared to other approaches to clusterability, the novelty consists in the possibility of an a posteriori (after running $k$-means) check if the data set is well-clusterable or not. As the $k$-means algorithm applied for this purpose has polynomial complexity so does therefore the appropriate check. Additionally, if $k$-means++ fails to identify a clustering that meets clusterability criteria, with high probability the data is not well-clusterable.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.07139/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1704.07139/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1704.07139/full.md

---
Source: https://tomesphere.com/paper/1704.07139