Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models
Chetkar Jha, Mingyao Li, Ian Barnett

TL;DR
This paper introduces a new sequential hypothesis testing method based on spectral properties to accurately estimate the number of communities in sparse stochastic block models, especially when the average degree grows slower than the network size.
Contribution
The paper presents a novel spectral-based sequential testing procedure that is consistent for sparse SBMs and can handle a growing number of communities with network size.
Findings
Method accurately estimates the number of communities in sparse networks.
Consistent for a broad range of sparsity parameters.
Performs well on real single-cell RNA sequencing datasets.
Abstract
Network-based clustering methods frequently require the number of communities to be specified \emph{a priori}. Moreover, most of the existing methods for estimating the number of communities assume the number of communities to be fixed and not scale with the network size . The few methods that assume the number of communities to increase with the network size are only valid when the average degree of a network grows at least as fast as (i.e., the dense case) or lies within a narrow range. This presents a challenge in clustering large-scale network data, particularly when the average degree of a network grows slower than the rate of (i.e., the sparse case). To address this problem, we proposed a new sequential procedure utilizing multiple hypothesis tests and the spectral properties of Erd\"{o}s R\'{e}nyi graphs for estimating the number of communities in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Single-cell and spatial transcriptomics
