Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Scott Emmons, Stephen Kobourov, Mike Gallant, Katy B\"orner

TL;DR
This study analyzes the relationship between different network clustering quality metrics and algorithms, revealing significant discrepancies and identifying conductance as a reliable indicator of information recovery performance across synthetic and real-world datasets.
Contribution
It provides a comprehensive comparison of four popular clustering algorithms and evaluates multiple quality metrics, highlighting their differences and practical implications.
Findings
Conductance best indicates information recovery performance.
Discrepancies exist between clustering metrics and actual algorithm performance.
Smart local moving outperforms other algorithms in most tests.
Abstract
Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
