Large Deviations of Semi-supervised Learning in the Stochastic Block Model
Hugo Cui, Luca Saglietti, Lenka Zdeborov\'a

TL;DR
This paper uses statistical physics to analyze the rare events and fluctuations in semi-supervised community detection within the stochastic block model, revealing how label choices influence inference accuracy.
Contribution
It introduces a large deviation framework for semi-supervised learning in the stochastic block model, capturing fluctuations and correlations in label selection.
Findings
Characterizes fluctuations around typical behavior in community detection
Identifies a non-monotonic relationship between accuracy and free energy
Provides insights into label informativeness and rarity
Abstract
In community detection on graphs, the semi-supervised learning problem entails inferring the ground-truth membership of each node in a graph, given the connectivity structure and a limited number of revealed node labels. Different subsets of revealed labels can in principle lead to higher or lower information gains and induce different reconstruction accuracies. In the framework of the dense stochastic block model, we employ statistical physics methods to derive a large deviation analysis for this problem, in the high-dimensional limit. This analysis allows the characterization of the fluctuations around the typical behaviour, capturing the effect of correlated label choices and yielding an estimate of their informativeness and their rareness among subsets of the same size. We find theoretical evidence of a non-monotonic relationship between reconstruction accuracy and the free energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
