TL;DR
This paper compares various methods for estimating the number of clusters in community detection, focusing on stochastic block models and evaluating multiple algorithms and criteria to understand overfitting and underfitting tendencies.
Contribution
It provides a comprehensive comparison of algorithms and assessment criteria for community detection, and introduces the alluvial diagram as a visualization tool for statistical inference results.
Findings
Assessment criteria and algorithms tend to overfit or underfit.
Alluvial diagrams effectively visualize statistical inference outcomes.
Spectral methods and inference show promising performance.
Abstract
We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have, becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
