Normalised clustering accuracy: An asymmetric external cluster validity measure

Marek Gagolewski

arXiv:2209.02935·cs.LG·October 16, 2025·1 cites

Normalised clustering accuracy: An asymmetric external cluster validity measure

Marek Gagolewski

PDF

Open Access

TL;DR

This paper introduces a normalized, asymmetric external cluster validity measure called normalised clustering accuracy, addressing limitations of classical similarity scores by improving interpretability and robustness in evaluating clustering algorithms.

Contribution

The paper proposes a novel external validity measure that corrects for imbalanced cluster sizes and offers better interpretability compared to traditional scores.

Findings

01

The new measure is scale-invariant and monotonic.

02

It corrects for imbalanced cluster sizes.

03

It better identifies worst-case scenarios.

Abstract

There is no, nor will there ever be, single best clustering algorithm. Nevertheless, we would still like to be able to distinguish between methods that work well on certain task types and those that systematically underperform. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. However, their validity is questionable because the clusterings they endorse can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to fixed ground truth groupings provided by experts. In this paper, we argue that the commonly used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, miss some desirable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Bayesian Methods and Mixture Models