Ground Truth Bias in External Cluster Validity Indices
Yang Lei, James C. Bezdek, Simone Romano, Nguyen Xuan Vinh, Jeffrey, Chan, James Bailey

TL;DR
This paper investigates a newly identified bias in external cluster validity indices caused by the distribution of the ground truth partition, affecting the interpretation of clustering validation results.
Contribution
It introduces the concept of ground truth (GT) bias in external CVIs and analyzes its empirical and theoretical implications, a novel contribution in clustering validation research.
Findings
Identifies ground truth bias as a new factor influencing CVI behavior
Shows how skewing ground truth distribution alters bias direction
Provides theoretical analysis of GT bias effects on CVI interpretations
Abstract
It has been noticed that some external CVIs exhibit a preferential bias towards a larger or smaller number of clusters which is monotonic (directly or inversely) in the number of clusters in candidate partitions. This type of bias is caused by the functional form of the CVI model. For example, the popular Rand index (RI) exhibits a monotone increasing (NCinc) bias, while the Jaccard Index (JI) index suffers from a monotone decreasing (NCdec) bias. This type of bias has been previously recognized in the literature. In this work, we identify a new type of bias arising from the distribution of the ground truth (reference) partition against which candidate partitions are compared. We call this new type of bias ground truth (GT) bias. This type of bias occurs if a change in the reference partition causes a change in the bias status (e.g., NCinc, NCdec) of a CVI. For example, NCinc bias in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Spatial and Panel Data Analysis · Customer Service Quality and Loyalty
