Mutual Information for the Stochastic Block Model by the Adaptive Interpolation Method
Jean Barbier, Chun Lam Chan, Nicolas Macris

TL;DR
This paper derives an exact formula for the mutual information in the asymmetric two-groups stochastic block model using a novel direct adaptive interpolation method, simplifying previous indirect approaches.
Contribution
It introduces a self-contained, direct proof for the mutual information of the stochastic block model, avoiding complex mappings to matrix estimation problems.
Findings
Provides a single-letter variational expression for mutual information.
Simplifies the proof technique using adaptive interpolation.
Eliminates the need for indirect mappings and multiple existing methods.
Abstract
We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained direct method using only the recently introduced adaptive interpolation method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Mutual information for the stochastic block model
by the adaptive interpolation method
Jean Barbier*∗, Chun Lam Chan†, and Nicolas Macris†*
Abstract
We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained and direct proof using only the recently introduced adaptive interpolation method.
††
The Abdus Salam International Center for Theoretical Physics, Trieste, Italy.
Communication Theory Laboratory, École Polytechnique Fédérale de Lausanne, Switzerland.
1 Introduction
The stochastic block model (SBM) has a long history and has attracted the attention of many disciplines. It was first introduced as a model of community detection in the networks and statistics literature [1], as a problem of finding graph bisections in theoretical computer science [2], and has also been proposed as a model for inhomogeneous random graphs [3, 4]. Here we adopt the community detection interpretation and motivation [5]. A partition of nodes into labeled groups is hidden to an observer who is only given a random graph generated on the basis of the partition. The task of the observer is to recover the hidden partition from the observed graph. A simple setting that lends itself to mathematical analysis is the following. The labels of nodes are drawn i.i.d. from a prior distribution and, for the graph, the edges between pairs of nodes are placed independently according to a probability which depends only on the group labels. If the probability is slightly higher (resp. lower) when the pair of nodes have the same label the model is called assortative (resp. disassortative). Moreover we suppose that the parameters of the prior and edge probability distributions are all known so that we are working in the framework of Bayesian (optimal) inference. Note that the recovery task is non-trivial only when parameters are such that no information about the group label is revealed from the degrees of nodes. Much progress has been done in recent years within this simple mathematical setting and we refer to [6] for a recent comprehensive review and references.
In the limit of large number of nodes the SBM displays interesting phase transitions for (partial) recovery of the hidden partition and much effort has been deployed to characterize the phase diagram, in terms of information theoretic as well as algorithmic phase transition thresholds, and compute the algorithmic-to-statistical gaps. In this vein a fundamental quantity is the mutual information between the hidden labels of the nodes and the observed graph. Indeed from the asymptotic value of the mutual information per node one can compute information theoretic thresholds of recovery. In this paper we focus on the mutual information of the two-group SBM with possibly asymmetric group sizes, in dense regimes where the expected degree of the nodes diverges with the total number of nodes (and is independent of the group label). We rigorously determine a single-letter variational expression for the asymptotic mutual information by means of the recently developed adaptive interpolation method [7, 8].
Single-letter variational expressions for the mutual information of the SBM are not new. They were first analytically derived in heuristic ways by methods of statistical physics and in this context are often called replica or cavity formulas [9]. Rigorous proofs then appeared in [10, 11]. These approaches are indirect in the sense that the SBM is first mapped on a rank-one matrix factorization problem, and then the matrix factorization problem is solved. In [10] the particular case of two equal size communities is considered and the analysis relies on the fact that in this case the information theoretic phase transition is of the second order type (i.e., continuous) which allows to use message-passing arguments. The asymmetric case is more challenging because first order (discontinuous) phase transitions appears for large enough asymmetry. In [11] this case is tackled through a Guerra-Toninelli interpolation combined with a rigorous version of the cavity method or Aizenman-Sims-Starr scheme [12]. Strictly speaking the analysis [11] does not cover the widest possible regime of dense graphs (see section two for details). We note that the mutual information of rank-one matrix factorization had also been determined earlier in [13] for the symmetric case and more recently for the general case in [14, 15] using a spatial coupling method.
The proof presented here covers the asymmetric two-group SBM and has the virtue of being completely unified. It uses a single method, namely the adaptive interpolation, is conceptually simpler, and is direct as it does not make any detour through another model. The method is a powerful evolution of the classic Guerra-Toninelli interpolation [16] and allows to derive tight upper and lower bounds for the mutual information, whereas the classic interpolation only yields a one-sided inequality. It has been successfully applied to a range of Bayesian inference problems, e.g., [17, 18]. Here, besides various new technical aspects, the main novelty is that we do not use Gaussian integration by parts, as is generally the case in interpolation methods. Instead, we develop a general approximate integration by parts formula and apply it to the Bernoulli random elements of the adjacency matrix of the graph. We note that related approximate integration by parts formulas have already been used by [19, 20] in the context of the Hopfield and Sherrington-Kirkpatrick models.
It would be desirable to extend the present method to the sparse regime of the SBM where the average degree of the nodes stays finite as the number of nodes diverges. This is much more challenging however, and the mutual information has so far been determined only for the disassortative case [21] while the assortative case remains open. The thresholds however have been successfully determined for both cases in [22, 23, 24, 25]. The adaptive interpolation method has been developed for the related censored block model in the sparse regime [26] and hopefully it can be also extended to the sparse SBM, which we leave for future work.
2 Setting and results: asymmetric two-groups SBM
We first formulate the SBM for two communities that may be of different sizes. Suppose we have nodes belonging to two communities where the partition is denoted by a vector . Labels are i.i.d. Bernoulli random variables with . The size of each community is and up to fluctuations of . The labels are hidden and instead one is given a random undirected graph constructed as follows (equivalently one is given an adjacency marix). An edge between node and is present with probability and absent with the complementary probability. To specificy , first we define such that
[TABLE]
We require these two constraints for the inference problem to be non-trivial, in the sense that no information about the labels stems from the nodes’ degrees. The two constraints imply
[TABLE]
so that we can interpret as the average degree of a node. Then we define where are the four possible matrix elements of
[TABLE]
Because of (1) and (2), we have the equations
[TABLE]
Solving this system imposes and . Therefore there are three independent parameters, namely , and . A more convenient re-parametrization is often used [10] instead of :
[TABLE]
Here is the average probability for the presence of an edge. We will look at the dense asymmetric SBM (the symmetric model corresponding to ) regimes where . In our analysis the growth of spans the whole spectrum from arbitrarily slow, at the verge of a sparse graph, to linear , , for fully dense graphs.
In this paper we rigorously determine the asymptotic mutual information for this problem in the dense graph regime wherein and satisfy:
- (h1)
(Dense SBM) . 2. (h2)
(Appropriate scaling of signal-to-noise ratio) \lambda_{n}\equiv n\Delta_{n}^{2}/\big{(}\bar{p}_{n}(1-\bar{p}_{n})\big{)}=d_{n}(1-b_{n})^{2}/(1-d_{n}/n)\xrightarrow{n\rightarrow\infty}\lambda finite.
The first condition ensures that the graph is dense in the sense that , still maintaining . The second ensures the mutual information has a well defined non-trivial limit when . Note that the second condition requires as \Delta_{n}/\big{(}\bar{p}_{n}(1-\bar{p}_{n})^{2}\big{)}=\sqrt{\lambda_{n}/(n\bar{p}_{n}(1-\bar{p}_{n})^{3})}\rightarrow 0 as , hence and . The reader may wish to keep in mind two simple typical examples. The first example is a dense graph with , so and . The second example is with , so and . These are easily translated back to the matrix .
We note that in the sparse graph version of the model one would have a finite limit for but the second condition would be the same. The analysis of the sparse case is however more difficult and is not addressed in this paper.
Instead of working with the Ising spin variables it is convenient to change the alphabet. We define with and . The hidden labels of the nodes now belong to the alphabet and . An edge is then present with conditional probability
[TABLE]
This can be viewed as an asymmetric binary-input binary-output channel and the inference problem is to recover the input (or ) from the channel output . Henceforth we adopt the notation
[TABLE]
for the probability distribution of the hidden labels . Note that .
We now formulate our results which provide a single-letter variational formula for the asymptotic mutual information. Let and independently, and set for :
[TABLE]
The so-called replica formula conjectures the identity
[TABLE]
We prove that (4) is correct, namely:
Theorem 2.1** (Upper bound).**
For the SBM under concern in the regime (h1), (h2),
[TABLE]
Theorem 2.2** (Lower bound).**
For the SBM under concern in the regime (h1), (h2),
[TABLE]
Remark 1: Of course we have and in the following we will work with where .
Remark 2: Elementary analysis shows that the minimum over of is attained for .
Remark 3: From (4) one can derive the information theoretic phase transition thresholds. Let . For "small" asymmetry between group sizes there is a continuous phase transition at while for "large" asymmetry the phase transition becomes discontinuous. An information theoretic-to-algorithmic gap occurs in the second situation as discussed in detail in [11].
Let us explain the relation of these theorems with previous works. In [10] they were obtained for the symmetric case by a mapping of the model on a rank-one matrix estimation problem via an application of Lindeberg’s theorem. The regime treated is essentially the same than ours except that in place of [10] has . Note that the difference only matters if which is the complete graph limit. Still using the same mapping to matrix factorization, [11] treats the asymmetric case, however in a limit where first and after (in fact this anlaysis can accomodate any growth slower than ) but it is unclear whether this is possible for denser regimes. Our analysis covers this gap and the whole spectum of growth for up to linear growth is allowed. Besides, we propose a self-contained and direct method using the adaptive interpolation method [7]. A technical limitation of interpolation methods has often been the need to use Gaussian integration by parts. We by-pass this limitation using an (approximate) integration by parts formula for the edge binary variables .
Before we formulate the adaptive interpolation let us set up more explicitly the quantities that we compute. The distribution of given the hidden partition is the inhomogeneous Erdoes-Rényi graph measure:
[TABLE]
Using this measure and Bayes rule, we find the posterior distribution of the SBM
[TABLE]
where . Therefore, the posterior distribution becomes
[TABLE]
We use the statistical mechanics terminology and therefore call this posterior distribution the Gibbs distribution. The normalizing factor
[TABLE]
is the partition function, and is the Hamiltonian. A straightforward computation, using the scaling regime (h1) and (h2), gives the following formula (see the proof in Appendix A):
Proposition 2.3** (Linking the mutal information and log-partition function).**
For the SBM under concern we have
[TABLE]
where .
Thus the problem boils down to compute minus the expected log-partition function, or expected free energy, in the limit . This will be achieved via an interpolation towards the log-partition function of independent scalar Gaussian channels where the observations about the hidden labels are of the form
[TABLE]
with i.i.d. Gaussian random variables and the signal-to-noise ratio (SNR). An important feature of our technique is the freedom to adapt a suitable interpolation path to the problem at hand. This is explained in the next section.
3 Adaptive path interpolation
We design an interpolating model parametrized by and s.t. at we recover the original SBM, while at we have a decoupled channel similar to (6). For the model is a mixture of the SBM with parameters and the extra decoupled Gaussian observations (6) with SNR replaced by
[TABLE]
with . The transition kernels for the channels and at time are
[TABLE]
We constrain where as at an appropriate rate to be fixed later on. The interpolating Hamiltonian is then defined to be
[TABLE]
where
[TABLE]
The posterior distribution expressed with the Hamiltonian then reads
[TABLE]
Therefore the Gibbs-bracket (i.e., the expectation operator w.r.t. the posterior distribution) for the interpolating model is
[TABLE]
with the partition function . The reader should keep in mind that Gibbs-brackets are therefore functions of the quenched random variables . The free energy for a given graph (that depends on the ground truth partition) and decoupled observation is
[TABLE]
and its expectation
[TABLE]
By construction,
[TABLE]
In particular, when we have
[TABLE]
Therefore
[TABLE]
where collects all contributions that tend to zero uniformly in when . Eventually, we reach the following fundamental sum rule (see section 4 for the derivation):
[TABLE]
where
[TABLE]
and the overlap is
[TABLE]
Two generic tools that we will widely use in our proof are the following:
- •
The Nishimori identity: Let be a couple of random variables with joint distribution and conditional distribution . Let and let be i.i.d. copies from the conditional distribution. Let us denote the expectation w.r.t. the product distribution over copies and the expectation w.r.t. the joint distribution. Then, for all continuous bounded functions we have
[TABLE]
The expectation is over .
Proof.
This is a simple consequence of Bayes formula. It is equivalent to sample the couple according to its joint distribution or to sample first according to its marginal distribution and then to sample conditionally on from the conditional distribution. Thus the two -tuples and have the same law. ∎
In the present case with joint law . Let us take i.i.d. copies drawn from the posterior distribution . Then for any continuous bounded function
[TABLE]
where is over . More precisely . Note that, by a slight abuse of notation, we continue to use the Gibbs-bracket notation for expressions depending on multiple i.i.d. copies from the posterior, so that corresponds to the expectation w.r.t. the product measure .
- •
Gaussian integration by parts: Integration by parts implies that for any bounde differentiable function of we have
[TABLE]
We are now ready to provide the proofs of the bounds on the mutual information.
3.1 The upper bound: proof of Theorem 2.1
Set and a non-negative constant. Then we have , . Since , (15) implies
[TABLE]
Since is continuous w.r.t its second argument . Optimizing over yields the bound (optimization over does not yield a sharper bound, see remark 2).
3.2 The lower bound: proof of Theorem 2.2
The basic idea is to “remove” from (15) by adapting . Then taking the limit and will provide the desired bound since and will disappear. To implement this idea we first decompose into
[TABLE]
and address each part with the following two lemmas. The proof of Lemma 3.2 can be found in section 5.
Lemma 3.1**.**
For every and there exists a (unique) bounded solution to the first order differential equation
[TABLE]
Furthermore
[TABLE]
Proof.
Let . Equation (19) is thus a first-order differential equation. Also note that, letting be the derivative w.r.t. the second argument,
[TABLE]
To get the last identity, we used Gaussian integration by parts, which reads when applied to Gibbs brackets,
[TABLE]
Indeed, one must be careful that in the definition of the Gibbs bracket both the Hamiltonian and partition function are functions of the quenched variable , thus the appearance of two terms when we differentiate w.r.t . Now, using the Nishimori identity to replace the hidden partition by a new independent sample from the posterior in (21) (which yields, e.g., or ) we reach
[TABLE]
The function is bounded and takes values in . Indeed by the Nishimori identity, thus again by the Nishimori identity, and finally . In addition of being bounded, is differentiable w.r.t. its second argument, with bounded derivative as seen from (22). The Cauchy-Lipschitz theorem then implies that (19) admits a unique global solution over . Finally Liouville’s formula (see Appendix B) gives
[TABLE]
The non-negativity of then implies . ∎
We now state a crucial concentration result for the overlap. Its validity is a consequence of the fact that the problem is analyzed in the so-called Bayesian optimal setting. This means that all hyper-parameters in the problem, namely , are assumed to be known, so that the posterior of the model can be written exactly. It implies the validity of the Nishimori identity which in turn allows to prove the following result (see section 5):
Lemma 3.2** (Overlap concentration).**
Let be the solution in Lemma 3.1. Then for any bounded positive sequence there exists a sequence converging to a constant and such that
[TABLE]
Now we average (15) over a small interval (note that is independent of ) and set to the solution of (19) in Lemma 3.1; therefore . This choice cancels the first term of in the decomposition (18). The second term in (18) is then upper bounded using Lemma 3.2. Finally . Combining all these observations we obtain
[TABLE]
where we used Fubini’s theorem to switch the and integrals when using Lemma 3.2. Using and , we see that is bounded uniformly in :
[TABLE]
Therefore the average of over has the same upper bound. Now, since
[TABLE]
and we have (we use large enough for the l.h.s inequality). Thus by remark 2 and the mean value theorem
[TABLE]
These remarks imply a relaxation of (24):
[TABLE]
Finally, setting with ensures the extra terms on the r.h.s. of (24) vanish as . Then taking the and using we finally reach the desired bound.
4 The fundamental sum rule: proof of (15)
In this section we use the notation for (11) without explicitly indicating the dependence in its arguments. When is set to zero for a specific pair all other , being fixed we write . Expectation with respect to the set of all , is denoted by .
The derivative of the averaged free energy can be decomposed into three terms:
[TABLE]
where
[TABLE]
4.1 Term .
Lemma 4.1**.**
We have D_{1}=\frac{\lambda_{n}}{4}\mathbb{E}\langle Q^{2}\rangle_{t,\epsilon}+\mathcal{O}(\frac{1}{n})+\mathcal{O}\big{(}\frac{\lambda_{n}^{3/2}}{\sqrt{n\bar{p}_{n}(1-\bar{p}_{n})^{3}}}\big{)}.
Proof.
Note that by (7) we have
[TABLE]
This gives
[TABLE]
with the definitions
[TABLE]
where , and recalling
[TABLE]
Both and involve the term . In Section 6 we derive an approximate integration by parts formula that, when applied in the present case, yields
Lemma 4.2**.**
Fix and recall that with conditional mean . Let be the first partial derivative of with respect to . We have the approximate integration by parts formula
[TABLE]
where
[TABLE]
and is the evaluation of at all other variables , being fixed.
The approximate integration by part formula (28) implies that the term of (27) can be written as (recall )
[TABLE]
Applying again the approximate integration by parts formula (28) the term of (27) can be written as (recall )
[TABLE]
where we define
[TABLE]
We show in Appendix C that in (30) the terms and approximately cancel so that
[TABLE]
Finally, substituting (29) and (31) into (27) gives
[TABLE]
where, in the last two equalities, we used and . With (h1) and (h2), all the error terms represented by the big-O notations tend to zero. ∎
4.2 Term .
Lemma 4.3**.**
We have .
Proof.
Recall (8). Using Gaussian integration by parts (17) we obtain
[TABLE]
where we used that , and then the definition of the overlap. ∎
4.3 Term .
Lemma 4.4**.**
We have .
Proof.
Using the Nishimori identity (16) we obtain
[TABLE]
by independence of the centered noise and the hidden partition .
Again the Nishimori identity (16) is used to obtain
[TABLE]
where the last line follows from . ∎
4.4 Final derivations of the sum rule.
The last missing term in order to simplify the sum rule (14) is:
Lemma 4.5**.**
We have .
Proof.
Using Gaussian integration by parts (17) and from (16) the specific Nishimori identity we have (recall also that )
[TABLE]
∎
Recall . Substituting (26), and Lemmas 4.1, 4.3 and 4.4 as well as 4.5 into (14) yields
[TABLE]
which is the sum rule (15).
5 Concentration of overlap: proof of Lemma 3.2
Concentration of overlap has been shown for various Bayesian inference problems, see, e.g., [18, 7, 8]. These proofs can be adapted to the present case. The idea is to bound the fluctuations of the overlap by those of another, easier to control, object defined below. This object is more natural to work with as it is directly related to derivatives of the free energy, which, itself concentrates. Let us present the main steps of the proof, and then provide the proof details afterwards.
Let
[TABLE]
As said previously, we can relate the fluctuations of the overlap to those of :
Lemma 5.1** (A fluctuation identity).**
We have .
It therefore remains to show the concentration of . We divide the task into two parts:
[TABLE]
These two terms are controlled by the following lemmas:
Lemma 5.2** (Thermal fluctuations).**
Let be such that . We then have
[TABLE]
Lemma 5.3** (Quenched fluctuations).**
Let , with and taking values in , be such that . There exists a sequence converging to a constant such that
[TABLE]
The proof of Lemma 5.2 and Lemma 5.3 employ some useful identities for the derivatives of the free energy (recall ):
[TABLE]
where we simply denote, when no confusion can arise, . Taking expectation on both sides of (35) and (36) we have
[TABLE]
The proof of Lemma 3.2 is ended by applying Lemmas 5.1, 5.2 and 5.3 in conjunction with (33):
[TABLE]
We now provide the proofs of Lemmas 5.1 to 5.4. For the sake of readibility, we simply denote for the rest of this section.
5.1 Proof of Lemma 5.1
We start by proving
[TABLE]
Using the definitions and (32) gives
[TABLE]
Gaussian integration by parts then yields
[TABLE]
These two formulas simplify (41) to
[TABLE]
The Nishimori identity implies
[TABLE]
These formulas further simplify (42) to
[TABLE]
which is (40).
Identity (40) implies
[TABLE]
and application of the Cauchy-Schwarz inequality then gives
[TABLE]
This ends the proof of Lemma 5.1.
5.2 Proof of Lemma 5.2
First note that . Then, using (38), , , and the Nishimori identity ,
[TABLE]
From (37) , therefore . Integrating over then gives
[TABLE]
5.3 Proof of Lemma 5.3
Lemma 5.3 is based on the concentration of the free energy, a very general fact in "well behaved" statistical mechanics models. The proof of the following lemma uses more or less standard methods and can found in Appendix D.
Lemma 5.4** (Free energy fluctuations).**
There exists a sequence converging to a constant when , such that
[TABLE]
Recall . Let
[TABLE]
From (39) we see that is concave in . Furthermore, from (36) and for , we see that is also concave in . So that we can employ the following lemma (see the end of this section for a proof):
Lemma 5.5** (A bound on the difference of derivatives due to concavity).**
Let and be concave functions. Let and define and . Then
[TABLE]
From (44) we have
[TABLE]
and from (35) and (37) we have
[TABLE]
Using Lemma 5.5 we then get
[TABLE]
where and . Then squaring this inequality, using , taking the expectation, and recalling that we reach
[TABLE]
Note that with . Recall from Lemma 3.1. We can upper bound by . These remarks with Lemma 5.4 simplify (45) to
[TABLE]
Recall (37) and that . We have
[TABLE]
and therefore . Using and we then have
[TABLE]
using the mean value theorem for the last step. Therefore upon integrating (46) over we have
[TABLE]
The bound is optimized choosing . This ends the proof.
Proof of Lemma 5.5.
Concavity implies that for any we have
[TABLE]
Combining these two inequalities ends the proof. ∎
6 Approximate integration by parts: proof of lemma 4.2
The following general formula follows from Taylor expansion with Lagrange remainder. When the r.h.s is small in specific applications, the formula can be seen as an approximate integration by parts formula generalizing Gaussian integration by parts.
Lemma 6.1**.**
Let be a function of a random variable such that for we have \sup_{U}\big{|}g^{(k)}(U)\big{|}\leq C_{k} for some constants and . Suppose that the first four moments of are finite. Then
[TABLE]
Proof.
By Taylor’s theorem any function can be written as
[TABLE]
Taking the expectation on both sides:
[TABLE]
When (49) is applied to we have
[TABLE]
On the other hand when (49) is applied to , using we have
[TABLE]
Subtracting (50) and (51) we have the bound
[TABLE]
which is the right hand side of (48) after factorization. ∎
We now apply lemma 6.1 to our specific problem in order to derive the approximate integration by parts formula (28).
Proof of lemma 4.2. In order to apply lemma 6.1 to the SBM, consider and the free energy (11) seen as a function of (all other variables being fixed). For the expectation we take . At time and for any integer
[TABLE]
because . For the derivatives we note that using the Taylor expansion of the logarithm, one obtains for any and , , which also implies . (The reader should keep this fact in mind, as it is used again in the appendices whenever we need to expand the logarithm.) Now this fact implies
[TABLE]
To obtain these identities the reader has again to be careful in performing the derivatives: both the exponential of the Hamiltonian and the partition function appearing in the definition of the Gibbs-bracket depend on (see the derivation of (21) for similar computations). In general,
[TABLE]
Using Lemma 6.1 we have
[TABLE]
Then by the triangle inequality we extract
[TABLE]
and recognize formula (28).
Appendix A Mutual information and free energy: proof of Proposition 2.3
Using (3), we have the expression
[TABLE]
We divide both the numerator and denominator by the same factor, and then rewrite the denominator in exponential form:
[TABLE]
Recall . The first term in (53) equals
[TABLE]
Let . We can further write explicitly the expectation in (54) that leads us to conclude
[TABLE]
Using the Taylor expansion of the logarithm, (55) becomes
[TABLE]
where . This becomes the expression in (5) by noting that the last term is \mathcal{O}\Big{(}n\Delta_{n}^{3}/\big{(}\bar{p}_{n}(1-\bar{p}_{n})\big{)}^{2}\Big{)}=\mathcal{O}(\lambda_{n}^{3/2}/\sqrt{n\bar{p}_{n}(1-\bar{p}_{n})}).
Appendix B Liouville formula
Consider the differential equation (19) with . Differentiating w.r.t and using the chain rule gives
[TABLE]
Therefore we have
[TABLE]
Integrating (56) over we have
[TABLE]
Using , (57) implies
[TABLE]
This is known as Liouville’s formula for one-dimensional ordinary differential equations.
Appendix C Small error terms in the sum rule: proof of (31)
Recalling the definitions (9) and (10), let
[TABLE]
Also let , and be the Gibbs-bracket associated to the measure proportional to . The difference of free energy when changing one can be written in terms of this Gibbs-bracket:
[TABLE]
Using the Taylor expansion of the logarithms in (61), we have
[TABLE]
Therefore, replacing in the expression of , we find
[TABLE]
where
[TABLE]
We then observe that
[TABLE]
The difference between the Gibbs-brackets in (63) can be expanded as
[TABLE]
and we can evaluate by an interpolation:
[TABLE]
where is the Gibbs-bracket associated to the measure proportional to
[TABLE]
with defined in (59). By the Taylor expansion of the logarithms in (65) and using , we see that the first term of (64) is . The same kind of calculation is used to see that the second term of (64) is also . This implies for (63)
[TABLE]
which tends to zero. Now we conclude by noting that and using (62) and (66) to obtain (31).
Appendix D Concentration of free energy: proof of Lemma 5.4
The generation of quenched variables can be divided into two stages: firstly , then given , and independently the Gaussian noise . We expand the variance of free energy according to the two stages (recall ):
[TABLE]
In each stage the variables are all independently generated. This enables us to use Efron-Stein inequality to show the concentration of free energy.
Let be a vector such that differs from only at the -th which becomes drawn independently from the same distribution as the one of . We define and in the similar manner with respect to and . Efron-Stein’s inequality tells us that
[TABLE]
as well as
[TABLE]
By (67) it suffices to show that both (68) and (69) are upper bounded by for some large enough sequence that converges to a constant.
D.1 Bound on (68)
The bound obtained from Efron-Stein’s inequality is a sum of local variances of the free energy. The bound on the difference due to a local change can be estimated by interpolation. For the first one we have
[TABLE]
where the Gibbs-bracket is associated to the measure proportional to . This implies an upper bound on the first sum in (68):
[TABLE]
Another interpolation gives
[TABLE]
for some constant , and where is associated to the measure proportional to . This bounds the second sum in (68) as
[TABLE]
using that are [math], Bernoulli variables, and the variance
[TABLE]
as well as \big{(}\Delta_{n}/\big{(}p_{n}(1-\bar{p}_{n})\big{)}\big{)}^{2}=\lambda_{n}/(n\bar{p}_{n}(1-\bar{p}_{n})) in the last inequality.
D.2 Bound on (69)
We relax (69) with inequality so that
[TABLE]
The difference in the first sum is given by
[TABLE]
where is associated to the measure proportional to . Therefore the sum of square is bounded by using .
For the second sum we use another interpolation:
[TABLE]
where
[TABLE]
As , we have various ways to write . A convenient way is using
[TABLE]
A compact formula for can then be derived:
[TABLE]
Let and be the marginal of this sub-graph. Using (72) we obtain
[TABLE]
Substituting (73) into (71) gives
[TABLE]
where corresponds to the expectation with respect to the distribution . To evaluate the difference of free energy in (74), first we define , and is associated to defined in (59). The same calculation as in (60) – (61) gives
[TABLE]
Expanding the logarithms we can see (75) is \mathcal{O}\big{(}\Delta_{n}/(n\bar{p}_{n}(1-\bar{p}_{n}))\big{)}. Using this fact and that all other terms inside the sum of (74) are upper bounded by constants, we see that (74) is \mathcal{O}\big{(}\Delta_{n}^{2}/(\bar{p}_{n}(1-\bar{p}_{n})\big{)}=\mathcal{O}(\lambda_{n}/n). We can then upper bound the second term of (70):
[TABLE]
Acknowledgments
This work was supported by the SNSF grant no. 200021-156672.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels: First steps,” Social Networks , vol. 5, no. 2, pp. 109–137, 1983.
- 2[2] T. N. Bui, S. Chaudhuri, F. T. Leighton, and M. Sipser, “Graph bisection algorithms with good average case behavior,” in 25th Annual Symposium FOCS , 1984, pp. 181–192.
- 3[3] B. Söderberg, “General formalism for inhomogeneous random graphs,” Phys. Rev. E , vol. 66, p. 066121, 2002.
- 4[4] B. Bollobás, S. Janson, and O. Riordan, “The phase transition in inhomogeneous random graphs,” Random Struct. Algorithms , 2007.
- 5[5] S. Fortunato, “Community detection in graphs,” Physics Reports , vol. 486, no. 3, pp. 75 – 174, 2010.
- 6[6] E. Abbe, “Community detection and stochastic block models: Recent developments,” Journal of Machine Learning Research , vol. 18, 2018.
- 7[7] J. Barbier and N. Macris, “The adaptive interpolation method: a simple scheme to prove replica formulas in bayesian inference,” Probability Theory and Related Fields , Oct 2018.
- 8[8] J. Barbier and N. Macris, “The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models,” Journal of Physics A: Mathematical and General , vol. J Phys A-111295.R 1, 2019.
