Information-Theoretic Limits and Strong Consistency on Binary Non-uniform Hypergraph Stochastic Block Models
Hai-Xiao Wang

TL;DR
This paper establishes the fundamental limits and proposes an optimal algorithm for node classification in non-uniform hypergraph stochastic block models, achieving strong consistency across different regimes.
Contribution
It identifies the information-theoretic threshold for strong consistency and introduces a novel refinement algorithm that is provably optimal in all regimes.
Findings
Threshold for strong consistency identified via Generalized Hellinger distance.
Proposed refinement algorithm achieves optimality and strong consistency.
Aggregating all uniform layers improves clustering accuracy in diverging degree regimes.
Abstract
We investigate the unsupervised node classification problem on random hypergraphs under the non-uniform Hypergraph Stochastic Block Model (HSBM) with two equal-sized communities. In this model, edges appear independently with probabilities depending only on the labels of their vertices. We identify the threshold for strong consistency, expressed in terms of the Generalized Hellinger distance. Below this threshold, strong consistency is impossible, and we derive the Information-Theoretic (IT) lower bound on the expected mismatch ratio. Above the threshold, the parameter space is typically divided into two disjoint regions. When only the aggregated adjacency matrices are accessible, while one-stage algorithms accomplish strong consistency with high probability in the region far from the threshold, they fail in the region closer to the threshold. We propose a new refinement algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
