TL;DR
This paper evaluates how different community detection algorithms overfit or underfit on diverse real-world networks, revealing significant variability and proposing a new diagnostic for assessment.
Contribution
It provides a comprehensive comparison of 16 algorithms on 406 networks, introduces a diagnostic for over/underfitting, and identifies Bayesian methods as generally effective.
Findings
Algorithms vary widely in community count and composition.
Distinct groups of algorithms exhibit similar output patterns.
Bayesian methods generally outperform others in accuracy.
Abstract
A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
