Unifying Information-Theoretic and Pair-Counting Clustering Similarity
Alexander J. Gates

TL;DR
This paper develops a unified analytical framework connecting pair-counting and information-theoretic clustering similarity measures, clarifying their differences and guiding their application.
Contribution
It introduces a comprehensive analytical approach that unifies two major families of clustering similarity measures through co-occurrence expansions and higher-order agreement perspectives.
Findings
Both families are expressed as weighted co-occurrence expansions.
Pair-counting is a quadratic approximation; information-theoretic measures are higher-order.
The framework clarifies divergence conditions and guides measure selection.
Abstract
Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic, reflecting whether they quantify agreement through element pairs or aggregate information across full cluster contingency tables. Prior work has uncovered parallels between these families and applied empirical normalization or chance-correction schemes, but their deeper analytical connection remains only partially understood. Here, we develop an analytical framework that unifies these families through two complementary perspectives. First, both families are expressed as weighted expansions of observed versus expected co-occurrences, with pair-counting arising as a quadratic, low-order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Reliability and Agreement in Measurement · Mental Health Research Topics
