A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Ryan DeWolfe; Pawe{\l} Pra{\l}at; Fran\c{c}ois Th\'eberge

arXiv:2602.14855·cs.LG·March 23, 2026

A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Ryan DeWolfe, Pawe{\l} Pra{\l}at, Fran\c{c}ois Th\'eberge

PDF

Open Access

TL;DR

This paper introduces a new similarity measure for comparing clustering results that can handle overlaps and outliers, addressing limitations of existing methods and reducing common biases.

Contribution

The paper proposes a pragmatic similarity measure for overlapping and outlier-inclusive clusterings, with demonstrated desirable properties and bias reduction.

Findings

01

The measure effectively compares clusterings with overlaps and outliers.

02

It exhibits several desirable mathematical properties.

03

Experimental results show reduced bias compared to existing measures.

Abstract

Clustering algorithms are an essential part of the unsupervised data science ecosystem, and extrinsic evaluation of clustering algorithms requires a method for comparing the detected clustering to a ground truth clustering. In a general setting, the detected and ground truth clusterings may have outliers (objects belonging to no cluster), overlapping clusters (objects may belong to more than one cluster), or both, but methods for comparing these clusterings are currently undeveloped. In this note, we define a pragmatic similarity measure for comparing clusterings with overlaps and outliers, show that it has several desirable properties, and experimentally confirm that it is not subject to several common biases afflicting other clustering comparison measures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Anomaly Detection Techniques and Applications · Bayesian Methods and Mixture Models