# Simple Measures of Individual Cluster-Membership Certainty for Hard   Partitional Clustering

**Authors:** Dongmeng Liu, Jinko Graham

arXiv: 1704.00352 · 2018-01-23

## TL;DR

This paper introduces two probability-like measures to assess individual cluster-membership certainty in hard clustering, enabling better understanding of ambiguous memberships and comparing favorably with soft clustering methods.

## Contribution

The paper presents novel measures extending silhouette widths and pairwise dissimilarities that behave like probabilities for evaluating cluster membership certainty.

## Key findings

- Measures perform well on simulated datasets with ambiguous memberships.
- Proposed measures are comparable to soft clustering posterior probabilities.
- Application to Fisher's iris data demonstrates practical usefulness.

## Abstract

We propose two probability-like measures of individual cluster-membership certainty which can be applied to a hard partition of the sample such as that obtained from the Partitioning Around Medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual's tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior-probability estimators from either FANNY or the model-based clustering methods. We also illustrate the proposed measures by applying them to Fisher's classic iris data set.

---
Source: https://tomesphere.com/paper/1704.00352