Likelihood-based model selection for stochastic block models
Y.X. Rachel Wang, Peter J. Bickel

TL;DR
This paper develops a likelihood-based criterion for selecting the number of communities in stochastic block models, analyzing its asymptotic properties and extending it to degree-corrected models, suitable for large networks.
Contribution
It introduces a new model selection criterion based on likelihood ratios with proven asymptotic consistency for SBMs and DCSBMs.
Findings
Likelihood ratio statistic has normal limit under underfitting
Convergence rate established for overfitting case
Criterion remains valid with polylogarithmic degree growth
Abstract
The stochastic block model (SBM) provides a popular framework for modeling community structures in networks. However, more attention has been devoted to problems concerning estimating the latent node labels and the model parameters than the issue of choosing the number of blocks. We consider an approach based on the log likelihood ratio statistic and analyze its asymptotic properties under model misspecification. We show the limiting distribution of the statistic in the case of underfitting is normal and obtain its convergence rate in the case of overfitting. These conclusions remain valid when the average degree grows at a polylog rate. The results enable us to derive the correct order of the penalty term for model complexity and arrive at a likelihood-based model selection criterion that is asymptotically consistent. Our analysis can also be extended to a degree-corrected block model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
