Multiple Hypothesis Testing To Estimate The Number of Communities in   Sparse Stochastic Block Models

Chetkar Jha; Mingyao Li; Ian Barnett

arXiv:2201.04722·stat.ME·January 14, 2022

Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models

Chetkar Jha, Mingyao Li, Ian Barnett

PDF

Open Access

TL;DR

This paper introduces a new sequential hypothesis testing method based on spectral properties to accurately estimate the number of communities in sparse stochastic block models, especially when the average degree grows slower than the network size.

Contribution

The paper presents a novel spectral-based sequential testing procedure that is consistent for sparse SBMs and can handle a growing number of communities with network size.

Findings

01

Method accurately estimates the number of communities in sparse networks.

02

Consistent for a broad range of sparsity parameters.

03

Performs well on real single-cell RNA sequencing datasets.

Abstract

Network-based clustering methods frequently require the number of communities to be specified \emph{a priori}. Moreover, most of the existing methods for estimating the number of communities assume the number of communities to be fixed and not scale with the network size $n$ . The few methods that assume the number of communities to increase with the network size $n$ are only valid when the average degree $d$ of a network grows at least as fast as $O (n)$ (i.e., the dense case) or lies within a narrow range. This presents a challenge in clustering large-scale network data, particularly when the average degree $d$ of a network grows slower than the rate of $O (n)$ (i.e., the sparse case). To address this problem, we proposed a new sequential procedure utilizing multiple hypothesis tests and the spectral properties of Erd\"{o}s R\'{e}nyi graphs for estimating the number of communities in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Single-cell and spatial transcriptomics