Decomposing the Jaccard Distance and the Jaccard Index in ABCDE
Stephan van Staden

TL;DR
This paper introduces a decomposition of the Jaccard Distance and Index used in clustering comparison, providing new impact and quality metrics for better understanding and debugging clustering differences.
Contribution
It presents a novel decomposition of the Jaccard metrics, offering deeper insights and new metrics for evaluating clustering differences and their quality.
Findings
Decomposition yields Impact and Quality metrics.
Metrics are mathematically well-behaved and interrelated.
Provides new techniques for debugging and exploring clustering changes.
Abstract
ABCDE is a sophisticated technique for evaluating differences between very large clusterings. Its main metric that characterizes the magnitude of the difference between two clusterings is the JaccardDistance, which is a true distance metric in the space of all clusterings of a fixed set of (weighted) items. The JaccardIndex is the complementary metric that characterizes the similarity of two clusterings. Its relationship with the JaccardDistance is simple: JaccardDistance + JaccardIndex = 1. This paper decomposes the JaccardDistance and the JaccardIndex further. In each case, the decomposition yields Impact and Quality metrics. The Impact metrics measure aspects of the magnitude of the clustering diff, while Quality metrics use human judgements to measure how much the clustering diff improves the quality of the clustering. The decompositions of this paper offer more and deeper insight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics and Applications
MethodsSparse Evolutionary Training
