Identifying bias in cluster quality metrics
Mart\'i Renedo-Mirambell, Argimiro Arratia

TL;DR
This paper investigates biases in popular cluster quality metrics, revealing that most favor fewer larger clusters, and introduces a new metric called density ratio to address these biases.
Contribution
The study analyzes biases in existing metrics using synthetic network models and proposes the density ratio as a less biased alternative.
Findings
Most metrics favor fewer larger clusters.
Modularity and density ratio are less biased.
Synthetic models effectively reveal metric biases.
Abstract
We study potential biases of popular cluster quality metrics, such as conductance or modularity. We propose a method that uses both stochastic and preferential attachment block models construction to generate networks with preset community structures, to which quality metrics will be applied. These models also allow us to generate multi-level structures of varying strength, which will show if metrics favour partitions into a larger or smaller number of clusters. Additionally, we propose another quality metric, the density ratio. We observed that most of the studied metrics tend to favour partitions into a smaller number of big clusters, even when their relative internal and external connectivity are the same. The metrics found to be less biased are modularity and density ratio.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Functional Brain Connectivity Studies
