Generalization of Clustering Agreements and Distances for Overlapping Clusters and Network Communities
Reihaneh Rabbany, Osmar R. Za\"iane

TL;DR
This paper introduces a generalized algebraic framework for clustering agreement measures that effectively handle overlapping clusters and network communities, enhancing validation and analysis in complex network data.
Contribution
It provides a unified algebraic formulation connecting pair-counting and information-theoretic measures, extendable to overlapping clusters and network communities.
Findings
Unified algebraic framework for clustering agreement measures
Extension to overlapping clusters and network communities
Facilitates validation in social and information networks
Abstract
A measure of distance between two clusterings has important applications, including clustering validation and ensemble clustering. Generally, such distance measure provides navigation through the space of possible clusterings. Mostly used in cluster validation, a normalized clustering distance, a.k.a. agreement measure, compares a given clustering result against the ground-truth clustering. Clustering agreement measures are often classified into two families of pair-counting and information theoretic measures, with the widely-used representatives of Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), respectively. This paper sheds light on the relation between these two families through a generalization. It further presents an alternative algebraic formulation for these agreement measures which incorporates an intuitive clustering distance, which is defined based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
