Mutual Information for the Stochastic Block Model by the Adaptive   Interpolation Method

Jean Barbier; Chun Lam Chan; Nicolas Macris

arXiv:1902.07273·cs.IT·July 17, 2019

Mutual Information for the Stochastic Block Model by the Adaptive Interpolation Method

Jean Barbier, Chun Lam Chan, Nicolas Macris

PDF

TL;DR

This paper derives an exact formula for the mutual information in the asymmetric two-groups stochastic block model using a novel direct adaptive interpolation method, simplifying previous indirect approaches.

Contribution

It introduces a self-contained, direct proof for the mutual information of the stochastic block model, avoiding complex mappings to matrix estimation problems.

Findings

01

Provides a single-letter variational expression for mutual information.

02

Simplifies the proof technique using adaptive interpolation.

03

Eliminates the need for indirect mappings and multiple existing methods.

Abstract

We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained direct method using only the recently introduced adaptive interpolation method.

Equations409

E [deg (i) ∣ X_{i}^{0} = 1]

E [deg (i) ∣ X_{i}^{0} = 1]

E [deg (i) ∣ X_{i}^{0} = - 1]

E [deg (i)] = r E [deg (i) ∣ X_{i}^{0} = 1] + (1 - r) E [deg (i) ∣ X_{i}^{0} = - 1] = \frac{( n - 1 ) d _{n}}{n} \approx d_{n}

E [deg (i)] = r E [deg (i) ∣ X_{i}^{0} = 1] + (1 - r) E [deg (i) ∣ X_{i}^{0} = - 1] = \frac{( n - 1 ) d _{n}}{n} \approx d_{n}

M = \frac{d _{n}}{n} [a_{n} b_{n} b_{n} c_{n}] .

M = \frac{d _{n}}{n} [a_{n} b_{n} b_{n} c_{n}] .

E [deg (i) ∣ X_{i}^{0} = 1]

E [deg (i) ∣ X_{i}^{0} = 1]

E [deg (i) ∣ X_{i}^{0} = - 1]

\overset{p}{ˉ}_{n} \equiv \frac{d _{n}}{n}, and Δ_{n} \equiv \frac{d _{n} ( 1 - b _{n} )}{n} .

\overset{p}{ˉ}_{n} \equiv \frac{d _{n}}{n}, and Δ_{n} \equiv \frac{d _{n} ( 1 - b _{n} )}{n} .

P (G_{ij} = 1∣ X_{i} X_{j}) = \overset{p}{ˉ}_{n} + Δ_{n} X_{i} X_{j} .

P (G_{ij} = 1∣ X_{i} X_{j}) = \overset{p}{ˉ}_{n} + Δ_{n} X_{i} X_{j} .

P_{r} \equiv r δ_{X_{1}} + (1 - r) δ_{X_{2}}

P_{r} \equiv r δ_{X_{1}} + (1 - r) δ_{X_{2}}

Ψ (q, λ, r)

Ψ (q, λ, r)

n \to \infty lim \frac{1}{n} I (X^{0}; G) = q \in [0, λ] min Ψ (q, λ, r) .

n \to \infty lim \frac{1}{n} I (X^{0}; G) = q \in [0, λ] min Ψ (q, λ, r) .

lim sup_{n \to \infty} \frac{1}{n} I (X^{0}; G) \leq min_{q \in [0, λ]} Ψ (q, λ, r) .

lim sup_{n \to \infty} \frac{1}{n} I (X^{0}; G) \leq min_{q \in [0, λ]} Ψ (q, λ, r) .

lim inf_{n \to \infty} \frac{1}{n} I (X^{0}; G) \geq min_{q \in [0, λ]} Ψ (q, λ, r) .

lim inf_{n \to \infty} \frac{1}{n} I (X^{0}; G) \geq min_{q \in [0, λ]} Ψ (q, λ, r) .

P (G ∣ X) = i < j \prod (\overset{p}{ˉ}_{n} + Δ_{n} X_{i} X_{j})^{G_{ij}} (1 - \overset{p}{ˉ}_{n} - Δ_{n} X_{i} X_{j})^{1 - G_{ij}} .

P (G ∣ X) = i < j \prod (\overset{p}{ˉ}_{n} + Δ_{n} X_{i} X_{j})^{G_{ij}} (1 - \overset{p}{ˉ}_{n} - Δ_{n} X_{i} X_{j})^{1 - G_{ij}} .

P (X = x ∣ G)

P (X = x ∣ G)

\displaystyle=\exp\Big{\{}\sum_{i<j}\Big{(}G_{ij}\ln(\bar{p}_{n}+\Delta_{n}x_{i}x_{j})+(1-G_{ij})\ln(1-\bar{p}_{n}-\Delta_{n}x_{i}x_{j})\Big{)}\Big{\}}\prod_{i=1}^{n}\mathbb{P}_{r}(x_{i})

\displaystyle=\exp\Big{\{}\sum_{i<j}\Big{(}G_{ij}\ln(1+\frac{\Delta_{n}}{\bar{p}_{n}}x_{i}x_{j})+(1-G_{ij})\ln(1-\frac{\Delta_{n}}{1-\bar{p}_{n}}x_{i}x_{j})\Big{)}+D_{n}(\bar{p}_{n},\bm{G})\Big{\}}\prod_{i=1}^{n}\mathbb{P}_{r}(x_{i})

P (x ∣ G)

P (x ∣ G)

H_{SBM} (x; G)

Z (G) \equiv x \in X^{n} \sum e^{- H_{SBM} (x; G)} i = 1 \prod n P_{r} (x_{i})

Z (G) \equiv x \in X^{n} \sum e^{- H_{SBM} (x; G)} i = 1 \prod n P_{r} (x_{i})

\frac{1}{n} I (X; G)

\frac{1}{n} I (X; G)

Y_{i} = q X_{i} + Z_{i}, 1 \leq i \leq n,

Y_{i} = q X_{i} + Z_{i}, 1 \leq i \leq n,

q \to R (t, ϵ) \equiv ϵ + \int_{0}^{t} d s q (s, ϵ)

q \to R (t, ϵ) \equiv ϵ + \int_{0}^{t} d s q (s, ϵ)

P_{t} (G ∣ X)

P_{t} (G ∣ X)

\displaystyle=\exp\sum_{i<j}\Big{(}G_{ij}\ln(\bar{p}+\sqrt{1-t}\Delta_{n}X_{i}X_{j})+(1-G_{ij})\ln(1-\bar{p}_{n}-\sqrt{1-t}\Delta_{n}X_{i}X_{j})\Big{)}\,,

P_{t} (Y ∣ X)

H_{t, ϵ} (x; G, Y) \equiv H_{SBM; t} (x; G) + H_{dec; t, ϵ} (x; Y)

H_{t, ϵ} (x; G, Y) \equiv H_{SBM; t} (x; G) + H_{dec; t, ϵ} (x; Y)

H_{SBM; t} (x; G)

H_{SBM; t} (x; G)

H_{dec; t, ϵ} (x; Y (X, Z))

\displaystyle=-\sum_{i=1}^{n}\Big{(}R(t,\epsilon)X_{i}x_{i}+\sqrt{R(t,\epsilon)}Z_{i}x_{i}-R(t,\epsilon)\frac{x_{i}^{2}}{2}\Big{)}\,.

P_{t} (x ∣ G, Y) = \frac{\prod _{i = 1}^{n} P _{r} ( x _{i} ) exp ( - H _{t, ϵ} ( x ; G , Y ))}{\sum _{x \in X^{n}} \prod _{i = 1}^{n} P _{r} ( x _{i} ) exp ( - H _{t, ϵ} ( x ; G , Y ))} .

P_{t} (x ∣ G, Y) = \frac{\prod _{i = 1}^{n} P _{r} ( x _{i} ) exp ( - H _{t, ϵ} ( x ; G , Y ))}{\sum _{x \in X^{n}} \prod _{i = 1}^{n} P _{r} ( x _{i} ) exp ( - H _{t, ϵ} ( x ; G , Y ))} .

⟨ A ⟩_{t, ϵ} \equiv x \in X^{n} \sum A (x) P_{t} (x ∣ G, Y) = \frac{1}{Z _{t, ϵ} ( G , Y )} x \in X^{n} \sum A (x) e^{- H_{t, ϵ} (x; G, Y)} i = 1 \prod n P_{r} (x_{i})

⟨ A ⟩_{t, ϵ} \equiv x \in X^{n} \sum A (x) P_{t} (x ∣ G, Y) = \frac{1}{Z _{t, ϵ} ( G , Y )} x \in X^{n} \sum A (x) e^{- H_{t, ϵ} (x; G, Y)} i = 1 \prod n P_{r} (x_{i})

F_{t, ϵ} (G, Y) = F_{t, ϵ}

F_{t, ϵ} (G, Y) = F_{t, ϵ}

f_{t, ϵ} \equiv E_{X} E_{G ∣ X} E_{Y ∣ X} F_{t, ϵ} = E_{X} E_{G ∣ X} E_{Z} F_{t, ϵ} .

f_{t, ϵ} \equiv E_{X} E_{G ∣ X} E_{Y ∣ X} F_{t, ϵ} = E_{X} E_{G ∣ X} E_{Z} F_{t, ϵ} .

f_{t = 0, ϵ}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Mutual information for the stochastic block model

by the adaptive interpolation method

Jean Barbier*∗, Chun Lam Chan†, and Nicolas Macris†*

Abstract

We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained and direct proof using only the recently introduced adaptive interpolation method.

††

$*$ The Abdus Salam International Center for Theoretical Physics, Trieste, Italy.

$\dagger$ Communication Theory Laboratory, École Polytechnique Fédérale de Lausanne, Switzerland.

1 Introduction

The stochastic block model (SBM) has a long history and has attracted the attention of many disciplines. It was first introduced as a model of community detection in the networks and statistics literature [1], as a problem of finding graph bisections in theoretical computer science [2], and has also been proposed as a model for inhomogeneous random graphs [3, 4]. Here we adopt the community detection interpretation and motivation [5]. A partition of nodes into labeled groups is hidden to an observer who is only given a random graph generated on the basis of the partition. The task of the observer is to recover the hidden partition from the observed graph. A simple setting that lends itself to mathematical analysis is the following. The labels of nodes are drawn i.i.d. from a prior distribution and, for the graph, the edges between pairs of nodes are placed independently according to a probability which depends only on the group labels. If the probability is slightly higher (resp. lower) when the pair of nodes have the same label the model is called assortative (resp. disassortative). Moreover we suppose that the parameters of the prior and edge probability distributions are all known so that we are working in the framework of Bayesian (optimal) inference. Note that the recovery task is non-trivial only when parameters are such that no information about the group label is revealed from the degrees of nodes. Much progress has been done in recent years within this simple mathematical setting and we refer to [6] for a recent comprehensive review and references.

In the limit of large number of nodes the SBM displays interesting phase transitions for (partial) recovery of the hidden partition and much effort has been deployed to characterize the phase diagram, in terms of information theoretic as well as algorithmic phase transition thresholds, and compute the algorithmic-to-statistical gaps. In this vein a fundamental quantity is the mutual information between the hidden labels of the nodes and the observed graph. Indeed from the asymptotic value of the mutual information per node one can compute information theoretic thresholds of recovery. In this paper we focus on the mutual information of the two-group SBM with possibly asymmetric group sizes, in dense regimes where the expected degree of the nodes diverges with the total number of nodes (and is independent of the group label). We rigorously determine a single-letter variational expression for the asymptotic mutual information by means of the recently developed adaptive interpolation method [7, 8].

Single-letter variational expressions for the mutual information of the SBM are not new. They were first analytically derived in heuristic ways by methods of statistical physics and in this context are often called replica or cavity formulas [9]. Rigorous proofs then appeared in [10, 11]. These approaches are indirect in the sense that the SBM is first mapped on a rank-one matrix factorization problem, and then the matrix factorization problem is solved. In [10] the particular case of two equal size communities is considered and the analysis relies on the fact that in this case the information theoretic phase transition is of the second order type (i.e., continuous) which allows to use message-passing arguments. The asymmetric case is more challenging because first order (discontinuous) phase transitions appears for large enough asymmetry. In [11] this case is tackled through a Guerra-Toninelli interpolation combined with a rigorous version of the cavity method or Aizenman-Sims-Starr scheme [12]. Strictly speaking the analysis [11] does not cover the widest possible regime of dense graphs (see section two for details). We note that the mutual information of rank-one matrix factorization had also been determined earlier in [13] for the symmetric case and more recently for the general case in [14, 15] using a spatial coupling method.

The proof presented here covers the asymmetric two-group SBM and has the virtue of being completely unified. It uses a single method, namely the adaptive interpolation, is conceptually simpler, and is direct as it does not make any detour through another model. The method is a powerful evolution of the classic Guerra-Toninelli interpolation [16] and allows to derive tight upper and lower bounds for the mutual information, whereas the classic interpolation only yields a one-sided inequality. It has been successfully applied to a range of Bayesian inference problems, e.g., [17, 18]. Here, besides various new technical aspects, the main novelty is that we do not use Gaussian integration by parts, as is generally the case in interpolation methods. Instead, we develop a general approximate integration by parts formula and apply it to the Bernoulli random elements of the adjacency matrix of the graph. We note that related approximate integration by parts formulas have already been used by [19, 20] in the context of the Hopfield and Sherrington-Kirkpatrick models.

It would be desirable to extend the present method to the sparse regime of the SBM where the average degree of the nodes stays finite as the number of nodes diverges. This is much more challenging however, and the mutual information has so far been determined only for the disassortative case [21] while the assortative case remains open. The thresholds however have been successfully determined for both cases in [22, 23, 24, 25]. The adaptive interpolation method has been developed for the related censored block model in the sparse regime [26] and hopefully it can be also extended to the sparse SBM, which we leave for future work.

2 Setting and results: asymmetric two-groups SBM

We first formulate the SBM for two communities that may be of different sizes. Suppose we have $n$ nodes belonging to two communities where the partition is denoted by a vector $\bm{X}^{0}\in\{-1,1\}^{n}$ . Labels $X_{i}^{0}$ are i.i.d. Bernoulli random variables with $\mathbb{P}(X_{i}^{0}=1)=r\in(0,1/2]$ . The size of each community is $nr$ and $n(1-r)$ up to fluctuations of ${\cal O}(\sqrt{n})$ . The labels $\bm{X}^{0}$ are hidden and instead one is given a random undirected graph $\bm{G}$ constructed as follows (equivalently one is given an adjacency marix). An edge between node $i$ and $j$ is present with probability $\mathbb{P}(G_{ij}=1|X_{i}^{0},X_{j}^{0})$ and absent with the complementary probability. To specificy $\mathbb{P}(G_{ij}=1|X_{i}^{0},X_{j}^{0})$ , first we define $d_{n}$ such that

[TABLE]

We require these two constraints for the inference problem to be non-trivial, in the sense that no information about the labels stems from the nodes’ degrees. The two constraints imply

[TABLE]

so that we can interpret $d_{n}$ as the average degree of a node. Then we define $\mathbb{P}(G_{ij}=1|X_{i}^{0},X_{j}^{0})=M_{X_{i}^{0},X_{j}^{0}}$ where $M_{X_{i}^{0},X_{j}^{0}}$ are the four possible matrix elements of

[TABLE]

Because of (1) and (2), we have the equations

[TABLE]

Solving this system imposes $a_{n}=1-(1-1/r)(1-b_{n})$ and $c_{n}=1-(1-b_{n})/(1-1/r)$ . Therefore there are three independent parameters, namely $d_{n}$ , $b_{n}$ and $r$ . A more convenient re-parametrization is often used [10] instead of $b_{n},d_{n}$ :

[TABLE]

Here $\bar{p}_{n}\in(0,1)$ is the average probability for the presence of an edge. We will look at the dense asymmetric SBM (the symmetric model corresponding to $r=1/2$ ) regimes where $d_{n}=n\bar{p}_{n}\to+\infty$ . In our analysis the growth of $d_{n}$ spans the whole spectrum from arbitrarily slow, at the verge of a sparse graph, to linear $d_{n}=vn$ , $v\in(0,1)$ , for fully dense graphs.

In this paper we rigorously determine the asymptotic mutual information for this problem $\lim_{n\to\infty}\frac{1}{n}I(\bm{X}^{0};\bm{G})$ in the dense graph regime wherein $\bar{p}_{n}$ and $\Delta_{n}$ satisfy:

(h1)

(Dense SBM) $n\bar{p}_{n}(1-\bar{p}_{n})^{3}\xrightarrow{n\rightarrow\infty}\infty$ . 2. (h2)

(Appropriate scaling of signal-to-noise ratio) $\lambda_{n}\equiv n\Delta_{n}^{2}/\big{(}\bar{p}_{n}(1-\bar{p}_{n})\big{)}=d_{n}(1-b_{n})^{2}/(1-d_{n}/n)\xrightarrow{n\rightarrow\infty}\lambda$ finite.

The first condition ensures that the graph is dense in the sense that $d_{n}\to+\infty$ , still maintaining $\bar{p}_{n}\in(0,1)$ . The second ensures the mutual information has a well defined non-trivial limit when $n\to+\infty$ . Note that the second condition requires $\Delta_{n}\ll\bar{p}_{n}(1-\bar{p}_{n})^{2}$ as $\Delta_{n}/\big{(}\bar{p}_{n}(1-\bar{p}_{n})^{2}\big{)}=\sqrt{\lambda_{n}/(n\bar{p}_{n}(1-\bar{p}_{n})^{3})}\rightarrow 0$ as $n\rightarrow\infty$ , hence $\Delta_{n}\ll\bar{p}_{n}$ and $\Delta_{n}\ll(1-\bar{p}_{n})^{2}$ . The reader may wish to keep in mind two simple typical examples. The first example is a dense graph with $d_{n}=vn$ , $v\in[0,1]$ so $\bar{p}_{n}=v$ and $\Delta_{n}\approx\sqrt{\lambda v(1-v)/n}$ . The second example is $d_{n}=vn^{1-\theta}$ with $\theta\in(0,1)$ , so $p_{n}=vn^{-\theta}$ and $\Delta_{n}\approx\sqrt{\lambda vn^{-1-\theta}}$ . These are easily translated back to the matrix $M$ .

We note that in the sparse graph version of the model one would have a finite limit for $d_{n}$ but the second condition would be the same. The analysis of the sparse case is however more difficult and is not addressed in this paper.

Instead of working with the Ising spin $\pm 1$ variables it is convenient to change the alphabet. We define ${X}_{i}\equiv\phi_{r}(X_{i}^{0})$ with $\phi_{r}(1)=\sqrt{(1-r)/r}$ and $\phi_{r}(-1)=-\sqrt{r/(1-r)}$ . The hidden labels of the nodes now belong to the alphabet $\mathcal{X}\equiv\{\mathcal{X}_{1}=\sqrt{(1-r)/r},\mathcal{X}_{2}=-\sqrt{r/(1-r)}\}$ and ${\bm{X}}\in\mathcal{X}^{n}$ . An edge is then present with conditional probability

[TABLE]

This can be viewed as an asymmetric binary-input binary-output channel $\bm{X}\to\bm{G}$ and the inference problem is to recover the input $\bm{X}$ (or $\bm{X}^{0}$ ) from the channel output $\bm{G}$ . Henceforth we adopt the notation

[TABLE]

for the probability distribution of the hidden labels $\mathcal{X}\in\mathcal{X}$ . Note that $\mathbb{E}[X^{2}]=1$ .

We now formulate our results which provide a single-letter variational formula for the asymptotic mutual information. Let $Z\sim\mathcal{N}(0,1)$ and $X\sim\mathbb{P}_{r}$ independently, and set for $q>0$ :

[TABLE]

The so-called replica formula conjectures the identity

[TABLE]

We prove that (4) is correct, namely:

Theorem 2.1 (Upper bound).

For the SBM under concern in the regime (h1), (h2),

[TABLE]

Theorem 2.2 (Lower bound).

For the SBM under concern in the regime (h1), (h2),

[TABLE]

Remark 1: Of course we have $I(\bm{X}^{0};\bm{G})=I(\bm{X};\bm{G})$ and in the following we will work with $I(\bm{X};\bm{G})$ where $\bm{X}\in\mathcal{X}=\{\mathcal{X}_{1}=\sqrt{(1-r)/r},\mathcal{X}_{2}=-\sqrt{r/(1-r)}\}$ .

Remark 2: Elementary analysis shows that the minimum over $q\geq 0$ of $\Psi(q,\lambda,r)$ is attained for $q\in[0,\lambda]$ .

Remark 3: From (4) one can derive the information theoretic phase transition thresholds. Let $r_{*}\equiv(1-1/\sqrt{3})/2$ . For "small" asymmetry between group sizes $r\in[r_{*},1/2]$ there is a continuous phase transition at $\lambda_{c}=1$ while for "large" asymmetry $r\in\ ]0,r_{*}[$ the phase transition becomes discontinuous. An information theoretic-to-algorithmic gap occurs in the second situation as discussed in detail in [11].

Let us explain the relation of these theorems with previous works. In [10] they were obtained for the symmetric case $r=1/2$ by a mapping of the model on a rank-one matrix estimation problem via an application of Lindeberg’s theorem. The regime treated is essentially the same than ours except that in place of $(h1)$ [10] has $n\bar{p}_{n}(1-\bar{p}_{n})\to+\infty$ . Note that the difference only matters if $p_{n}\to 1$ which is the complete graph limit. Still using the same mapping to matrix factorization, [11] treats the asymmetric case, however in a limit where $n\to+\infty$ first and $d_{n}\to+\infty$ after (in fact this anlaysis can accomodate any growth slower than $d_{n}\approx n^{1/2}$ ) but it is unclear whether this is possible for denser regimes. Our analysis covers this gap and the whole spectum of growth for $d_{n}$ up to linear growth is allowed. Besides, we propose a self-contained and direct method using the adaptive interpolation method [7]. A technical limitation of interpolation methods has often been the need to use Gaussian integration by parts. We by-pass this limitation using an (approximate) integration by parts formula for the edge binary variables $G_{ij}\in\{0,1\}$ .

Before we formulate the adaptive interpolation let us set up more explicitly the quantities that we compute. The distribution of $G$ given the hidden partition $X$ is the inhomogeneous Erdoes-Rényi graph measure:

[TABLE]

Using this measure and Bayes rule, we find the posterior distribution of the SBM

[TABLE]

where $D_{n}(\bar{p}_{n},\bm{G})\equiv\sum_{i<j}G_{ij}\ln\bar{p}_{n}+(1-G_{ij})\ln(1-\bar{p}_{n})$ . Therefore, the posterior distribution becomes

[TABLE]

We use the statistical mechanics terminology and therefore call this posterior distribution the Gibbs distribution. The normalizing factor

[TABLE]

is the partition function, and $\mathcal{H}_{\rm SBM}$ is the Hamiltonian. A straightforward computation, using the scaling regime (h1) and (h2), gives the following formula (see the proof in Appendix A):

Proposition 2.3 (Linking the mutal information and log-partition function).

For the SBM under concern we have

[TABLE]

where $\lim_{n\rightarrow\infty}o_{n}(1)=0$ .

Thus the problem boils down to compute minus the expected log-partition function, or expected free energy, in the limit $n\to+\infty$ . This will be achieved via an interpolation towards the log-partition function of $n$ independent scalar Gaussian channels where the observations about the hidden labels are of the form

[TABLE]

with $Z_{i}\sim\mathcal{N}(0,1)$ i.i.d. Gaussian random variables and $q>0$ the signal-to-noise ratio (SNR). An important feature of our technique is the freedom to adapt a suitable interpolation path to the problem at hand. This is explained in the next section.

3 Adaptive path interpolation

We design an interpolating model parametrized by $t\in[0,1]$ and $\epsilon\geq 0$ s.t. at $t=\epsilon=0$ we recover the original SBM, while at $t=1$ we have a decoupled channel similar to (6). For $t\in(0,1)$ the model is a mixture of the SBM with parameters $(\bar{p}_{n},\sqrt{1-t}\,\Delta_{n})$ and the extra decoupled Gaussian observations (6) with SNR replaced by

[TABLE]

with $q(s,\epsilon)\geq 0$ . The transition kernels for the channels $\bm{X}\rightarrow\bm{G}$ and $\bm{X}\rightarrow\bm{Y}$ at time $t\in[0,1]$ are

[TABLE]

We constrain $\epsilon\in[s_{n},2s_{n}]$ where $s_{n}\to 0_{+}$ as $n\to+\infty$ at an appropriate rate to be fixed later on. The interpolating Hamiltonian is then defined to be

[TABLE]

where

[TABLE]

The posterior distribution expressed with the Hamiltonian $\mathcal{H}_{t,\epsilon}(\bm{x};\bm{G},\bm{Y})$ then reads

[TABLE]

Therefore the Gibbs-bracket (i.e., the expectation operator w.r.t. the posterior distribution) for the interpolating model is

[TABLE]

with the partition function ${\cal Z}_{t,\epsilon}({\bm{G}},{\bm{Y}})\equiv\sum_{{\bm{x}}\in\mathcal{X}^{n}}e^{-\mathcal{H}_{t,\epsilon}({\bm{x}};{\bm{G}},{\bm{Y}})}\prod_{i=1}^{n}\mathbb{P}_{r}(x_{i})$ . The reader should keep in mind that Gibbs-brackets are therefore functions of the quenched random variables $(\bm{Y}(\bm{X},\bm{Z}),\bm{G}(\bm{X}))$ . The free energy for a given graph $\bm{G}=\bm{G}(\bm{X})$ (that depends on the ground truth partition) and decoupled observation $\bm{Y}(\bm{X},\bm{Z})$ is

[TABLE]

and its expectation

[TABLE]

By construction,

[TABLE]

In particular, when $t=\epsilon=0$ we have

[TABLE]

Therefore

[TABLE]

where $o_{n}(1)$ collects all contributions that tend to zero uniformly in $\epsilon$ when $n\rightarrow\infty$ . Eventually, we reach the following fundamental sum rule (see section 4 for the derivation):

[TABLE]

where

[TABLE]

and the overlap is

[TABLE]

Two generic tools that we will widely use in our proof are the following:

•

The Nishimori identity: Let $(X,Y)$ be a couple of random variables with joint distribution $P(X,Y)$ and conditional distribution $P(\cdot|Y)$ . Let $k\geq 1$ and let $x^{(1)},\dots,x^{(k)}$ be i.i.d. copies from the conditional distribution. Let us denote $\langle-\rangle$ the expectation w.r.t. the product distribution $P(\cdot|Y)^{\otimes\infty}$ over copies and $\mathbb{E}$ the expectation w.r.t. the joint distribution. Then, for all continuous bounded functions $g$ we have

[TABLE]

The expectation $\mathbb{E}$ is over $(X,Y)$ .

Proof.

This is a simple consequence of Bayes formula. It is equivalent to sample the couple $(X,Y)$ according to its joint distribution or to sample first $Y$ according to its marginal distribution and then to sample $x$ conditionally on $Y$ from the conditional distribution. Thus the two $(k+1)$ -tuples $(Y,x^{(1)},\dots,x^{(k)})$ and $(Y,X,x^{(2)},\dots,x^{(k)})$ have the same law. ∎

In the present case $(X,Y)\to(\bm{X},\bm{G},\bm{Y})$ with joint law $\mathbb{P}_{t}(\bm{X}|\bm{G},\bm{Y})\prod_{i=1}^{n}\mathbb{P}_{r}(X_{i})$ . Let us take $k$ i.i.d. copies $\bm{x}^{(1)},\dots,\bm{x}^{(k)}$ drawn from the posterior distribution $\mathbb{P}_{t}(\cdot|\bm{G},\bm{Y})$ . Then for any continuous bounded function $g$

[TABLE]

where $\mathbb{E}$ is over $(\bm{G},\bm{Y})$ . More precisely $\mathbb{E}=\mathbb{E}_{\prod_{i=1}^{n}\mathbb{P}_{r}(X_{i})}\mathbb{E}_{\mathbb{P}_{t}(\bm{G}|\bm{X})}\mathbb{E}_{\mathbb{P}_{t}(\bm{Y}|\bm{X}})$ . Note that, by a slight abuse of notation, we continue to use the Gibbs-bracket notation for expressions depending on multiple i.i.d. copies from the posterior, so that $\langle-\rangle_{t,\epsilon}$ corresponds to the expectation w.r.t. the product measure $\mathbb{P}_{t}(\cdot|\bm{G},\bm{Y})^{\otimes\infty}$ .

•

Gaussian integration by parts: Integration by parts implies that for any bounde differentiable function $g$ of $Z\sim\mathcal{N}(0,1)$ we have

[TABLE]

We are now ready to provide the proofs of the bounds on the mutual information.

3.1 The upper bound: proof of Theorem 2.1

Set $\epsilon=0$ and $q(t,\epsilon)=q$ a non-negative constant. Then we have $\mathcal{R}_{1}=0$ , $\mathcal{R}_{3}=o_{n}(1)$ . Since $\mathcal{R}_{2}\geq 0$ , (15) implies

[TABLE]

Since $\Psi$ is continuous w.r.t its second argument $\limsup_{n\to+\infty}\frac{1}{n}I(\bm{X};\bm{G})\leq\Psi(q,\lambda,r)$ . Optimizing over $q\in[0,\lambda]$ yields the bound (optimization over $q\in[0,+\infty)$ does not yield a sharper bound, see remark 2).

3.2 The lower bound: proof of Theorem 2.2

The basic idea is to “remove” $\mathcal{R}_{2}$ from (15) by adapting $q(t,\epsilon)$ . Then taking the limit $n\rightarrow\infty$ and $\epsilon\rightarrow 0_{+}$ will provide the desired bound since $\mathcal{R}_{1}\geq 0$ and $\mathcal{R}_{3}\to 0$ will disappear. To implement this idea we first decompose $\mathcal{R}_{2}$ into

[TABLE]

and address each part with the following two lemmas. The proof of Lemma 3.2 can be found in section 5.

Lemma 3.1.

For every $\epsilon\in[0,1]$ and $t\in[0,1]$ there exists a (unique) bounded solution $R_{n}^{*}(t,\epsilon)=\epsilon+\int_{0}^{t}ds\,q_{n}^{*}(s,\epsilon)$ to the first order differential equation

[TABLE]

Furthermore

[TABLE]

Proof.

Let $G_{n}(t,R(t,\epsilon))\equiv\lambda_{n}\mathbb{E}\langle Q\rangle_{t,\epsilon}$ . Equation (19) is thus a first-order differential equation. Also note that, letting $dG_{n}/dR$ be the derivative w.r.t. the second argument,

[TABLE]

To get the last identity, we used Gaussian integration by parts, which reads when applied to Gibbs brackets,

[TABLE]

Indeed, one must be careful that in the definition of the Gibbs bracket both the Hamiltonian and partition function are functions of the quenched variable $\bm{Z}$ , thus the appearance of two terms when we differentiate w.r.t $Z$ . Now, using the Nishimori identity to replace the hidden partition $\bm{X}$ by a new independent sample from the posterior in (21) (which yields, e.g., $\mathbb{E}[X_{i}X_{j}\langle x_{i}x_{j}\rangle_{t,\epsilon}]=\mathbb{E}[\langle x_{i}x_{j}\rangle_{t,\epsilon}^{2}]$ or $\mathbb{E}[X_{i}\langle x_{i}x_{j}\rangle_{t,\epsilon}\langle x_{j}\rangle_{t,\epsilon}]=\mathbb{E}[\langle x_{i}\rangle_{t,\epsilon}\langle x_{i}x_{j}\rangle_{t,\epsilon}\langle x_{j}\rangle_{t,\epsilon}]$ ) we reach

[TABLE]

The function $G_{n}$ is bounded and takes values in $[0,\lambda_{n}]$ . Indeed $\mathbb{E}\langle Q\rangle_{t,\epsilon}=\mathbb{E}[X_{1}\langle x_{1}\rangle_{t,\epsilon}]=\mathbb{E}[\langle x_{1}\rangle_{t,\epsilon}^{2}]$ by the Nishimori identity, thus $\mathbb{E}\langle Q\rangle_{t,\epsilon}\leq\mathbb{E}\langle x_{1}^{2}\rangle_{t,\epsilon}=\mathbb{E}[X_{1}^{2}]$ again by the Nishimori identity, and finally $\mathbb{E}[X_{1}^{2}]=1$ . In addition of being bounded, $G_{n}$ is differentiable w.r.t. its second argument, with bounded derivative as seen from (22). The Cauchy-Lipschitz theorem then implies that (19) admits a unique global solution over $t\in[0,1]$ . Finally Liouville’s formula (see Appendix B) gives

[TABLE]

The non-negativity of $dG_{n}/dR$ then implies $dR_{n}^{*}/d\epsilon\geq 1$ . ∎

We now state a crucial concentration result for the overlap. Its validity is a consequence of the fact that the problem is analyzed in the so-called Bayesian optimal setting. This means that all hyper-parameters in the problem, namely $(\mathbb{P}_{r},r,\bar{p}_{n},\Delta_{n})$ , are assumed to be known, so that the posterior of the model can be written exactly. It implies the validity of the Nishimori identity which in turn allows to prove the following result (see section 5):

Lemma 3.2 (Overlap concentration).

Let $R$ be the solution $R_{n}^{*}$ in Lemma 3.1. Then for any bounded positive sequence $s_{n}$ there exists a sequence $C_{n}(r,\lambda_{n})>0$ converging to a constant and such that

[TABLE]

Now we average (15) over a small interval $\epsilon\in[s_{n},2s_{n}]$ (note that $I(\bm{X};\bm{G})$ is independent of $\epsilon$ ) and set $R$ to the solution $R_{n}^{*}$ of (19) in Lemma 3.1; therefore $q_{n}^{*}(t,\epsilon)=\lambda_{n}\mathbb{E}\langle Q\rangle_{t,\epsilon}$ . This choice cancels the first term of $\mathcal{R}_{2}$ in the decomposition (18). The second term in (18) is then upper bounded using Lemma 3.2. Finally $\mathcal{R}_{1}\geq 0$ . Combining all these observations we obtain

[TABLE]

where we used Fubini’s theorem to switch the $t$ and $\epsilon$ integrals when using Lemma 3.2. Using $q_{n}^{*}\in[0,\lambda_{n}]$ and $\epsilon\in[s_{n},2s_{n}]$ , we see that $\mathcal{R}_{3}$ is bounded uniformly in $\epsilon$ :

[TABLE]

Therefore the average of $\mathcal{R}_{3}$ over $\epsilon$ has the same upper bound. Now, since

[TABLE]

and $R_{n}^{*}(1,\epsilon)\in[s_{n},2s_{n}+\lambda_{n}]$ we have $-\frac{1}{4}\leq\frac{d}{d\lambda}\Psi(R_{n}^{*}(1,\epsilon),\lambda)\leq\frac{1}{4}$ (we use $n$ large enough for the l.h.s inequality). Thus by remark 2 and the mean value theorem

[TABLE]

These remarks imply a relaxation of (24):

[TABLE]

Finally, setting $s_{n}=n^{-\theta}$ with $\theta\in(0,1/4)$ ensures the extra terms on the r.h.s. of (24) vanish as $n\to+\infty$ . Then taking the $\liminf_{n\to+\infty}$ and using $\lambda_{n}\to\lambda$ we finally reach the desired bound.

4 The fundamental sum rule: proof of (15)

In this section we use the notation $F_{t,\epsilon}$ for (11) without explicitly indicating the dependence in its arguments. When $G_{ij}$ is set to zero for a specific pair $(i,j)$ all other $G_{k,l}$ , $(k,l)\neq(i,j)$ being fixed we write $F_{t,\epsilon}(G_{ij}=0)$ . Expectation with respect to the set of all $G_{k,l}$ , $(k,l)\neq(i,j)$ is denoted by $\mathbb{E}_{\sim G_{ij}}$ .

The derivative of the averaged free energy can be decomposed into three terms:

[TABLE]

where

[TABLE]

4.1 Term $D_{1}$ .

Lemma 4.1.

We have $D_{1}=\frac{\lambda_{n}}{4}\mathbb{E}\langle Q^{2}\rangle_{t,\epsilon}+\mathcal{O}(\frac{1}{n})+\mathcal{O}\big{(}\frac{\lambda_{n}^{3/2}}{\sqrt{n\bar{p}_{n}(1-\bar{p}_{n})^{3}}}\big{)}$ .

Proof.

Note that by (7) we have

[TABLE]

This gives

[TABLE]

with the definitions

[TABLE]

where $\mathbb{E}_{\sim G_{ij}}\equiv\mathbb{E}_{\bm{X}}\mathbb{E}_{\bm{Y}|\bm{X}}\mathbb{E}_{\bm{G}\setminus G_{ij}|\bm{X}}$ , and recalling

[TABLE]

Both $D_{1}^{(a)}$ and $D_{1}^{(b)}$ involve the term $\mathbb{E}_{G_{ij}|{\bm{X}}}[G_{ij}F_{t,\epsilon}]$ . In Section 6 we derive an approximate integration by parts formula that, when applied in the present case, yields

Lemma 4.2.

Fix $i,j\in\{1,\cdots,n\}^{2}$ and recall that $G_{ij}\in\{0,1\}$ with conditional mean $\mathbb{E}_{G_{ij}|X_{i},X_{j}}[G_{ij}]=\bar{p}_{n}+\sqrt{1-t}\Delta_{n}X_{i}X_{j}$ . Let $F_{t,\epsilon}^{(1)}(G_{ij})$ be the first partial derivative of $F_{t,\epsilon}$ with respect to $G_{ij}$ . We have the approximate integration by parts formula

[TABLE]

where

[TABLE]

and $F_{t,\epsilon}(G_{ij}=0)$ is the evaluation of $F_{t,\epsilon}$ at $G_{ij}=0$ all other variables $G_{kl}$ , $(k,l)\neq(i,j)$ being fixed.

The approximate integration by part formula (28) implies that the term $D_{1}^{(b)}$ of (27) can be written as (recall $\bar{p}_{n}(1-\bar{p}_{n})\gg\Delta_{n}$ )

[TABLE]

Applying again the approximate integration by parts formula (28) the term $D_{1}^{(a)}$ of (27) can be written as (recall $(1-\bar{p}_{n})^{2}\gg\Delta_{n}$ )

[TABLE]

where we define

[TABLE]

We show in Appendix C that in (30) the terms $E_{1}$ and $E_{2}$ approximately cancel so that

[TABLE]

Finally, substituting (29) and (31) into (27) gives

[TABLE]

where, in the last two equalities, we used $\lambda_{n}=n\Delta_{n}^{2}/(\bar{p}_{n}(1-\bar{p}_{n}))$ and $Q=\frac{1}{n}\sum_{i=1}^{n}X_{i}x_{i}$ . With (h1) and (h2), all the error terms represented by the big-O notations tend to zero. ∎

4.2 Term $D_{2}$ .

Lemma 4.3.

We have $D_{2}=-\frac{1}{2}q(t,\epsilon)\mathbb{E}\langle Q\rangle_{t,\epsilon}$ .

Proof.

Recall (8). Using Gaussian integration by parts (17) we obtain

[TABLE]

where we used that $\frac{dF_{t,\epsilon}}{dZ}=-\frac{1}{n}\langle\sqrt{R(t,\epsilon)}x_{i}\rangle_{t,\epsilon}$ , and then the definition of the overlap. ∎

4.3 Term $D_{3}$ .

Lemma 4.4.

We have $D_{3}=0$ .

Proof.

Using the Nishimori identity (16) we obtain

[TABLE]

by independence of the centered noise $\bm{Z}$ and the hidden partition $\bm{X}$ .

Again the Nishimori identity (16) is used to obtain

[TABLE]

where the last line follows from $\mathbb{E}_{G_{ij}|X_{i},X_{j}}G_{ij}=\bar{p}_{n}+\sqrt{1-t}\Delta_{n}X_{i}X_{j}$ . ∎

4.4 Final derivations of the sum rule.

The last missing term in order to simplify the sum rule (14) is:

Lemma 4.5.

We have $f_{0,0}-f_{0,\epsilon}=\frac{1}{2}\int_{0}^{\epsilon}d\epsilon^{\prime}\,\mathbb{E}\langle Q\rangle_{0,\epsilon^{\prime}}$ .

Proof.

Using Gaussian integration by parts (17) and from (16) the specific Nishimori identity $\mathbb{E}[\langle x_{i}\rangle_{0,\epsilon^{\prime}}^{2}]=\mathbb{E}[X_{i}\langle x_{i}\rangle_{0,\epsilon^{\prime}}]$ we have (recall also that $R(0,\epsilon^{\prime})=\epsilon^{\prime}$ )

[TABLE]

∎

Recall $R(1,\epsilon)=\epsilon+\int_{0}^{1}q(t,\epsilon)dt$ . Substituting (26), and Lemmas 4.1, 4.3 and 4.4 as well as 4.5 into (14) yields

[TABLE]

which is the sum rule (15).

5 Concentration of overlap: proof of Lemma 3.2

Concentration of overlap has been shown for various Bayesian inference problems, see, e.g., [18, 7, 8]. These proofs can be adapted to the present case. The idea is to bound the fluctuations of the overlap by those of another, easier to control, object $\mathcal{L}$ defined below. This object is more natural to work with as it is directly related to derivatives of the free energy, which, itself concentrates. Let us present the main steps of the proof, and then provide the proof details afterwards.

Let

[TABLE]

As said previously, we can relate the fluctuations of the overlap to those of $\cal L$ :

Lemma 5.1 (A fluctuation identity).

We have $\mathbb{E}\langle(Q-\mathbb{E}\langle Q\rangle_{t,\epsilon})^{2}\rangle_{t,\epsilon}\leq 4\,\mathbb{E}\langle(\mathcal{L}-\mathbb{E}\langle\mathcal{L}\rangle_{t,\epsilon})^{2}\rangle_{t,\epsilon}$ .

It therefore remains to show the concentration of $\mathcal{L}$ . We divide the task into two parts:

[TABLE]

These two terms are controlled by the following lemmas:

Lemma 5.2 (Thermal fluctuations).

Let $R(t,\epsilon)=\epsilon+\int_{0}^{t}ds\,q(s,\epsilon)\geq\epsilon$ be such that $dR/d\epsilon\geq 1$ . We then have

[TABLE]

Lemma 5.3 (Quenched fluctuations).

Let $R(t,\epsilon)=\epsilon+\int_{0}^{t}ds\,q(s,\epsilon)$ , with $\epsilon\in[s_{n},2s_{n}]$ and $q$ taking values in $[0,\lambda_{n}]$ , be such that $dR/d\epsilon\geq 1$ . There exists a sequence $C_{n}(r,\lambda_{n})>0$ converging to a constant such that

[TABLE]

The proof of Lemma 5.2 and Lemma 5.3 employ some useful identities for the derivatives of the free energy (recall $F_{t,\epsilon}\equiv-\frac{1}{n}\ln{\cal Z}_{t,\epsilon}(\bm{G},\bm{Y})$ ):

[TABLE]

where we simply denote, when no confusion can arise, $R=R(t,\epsilon)$ . Taking expectation on both sides of (35) and (36) we have

[TABLE]

The proof of Lemma 3.2 is ended by applying Lemmas 5.1, 5.2 and 5.3 in conjunction with (33):

[TABLE]

We now provide the proofs of Lemmas 5.1 to 5.4. For the sake of readibility, we simply denote $\langle-\rangle\equiv\langle-\rangle_{t,\epsilon}$ for the rest of this section.

5.1 Proof of Lemma 5.1

We start by proving

[TABLE]

Using the definitions $Q\equiv\frac{1}{n}\sum_{i=1}^{n}x_{i}X_{i}$ and (32) gives

[TABLE]

Gaussian integration by parts then yields

[TABLE]

These two formulas simplify (41) to

[TABLE]

The Nishimori identity implies

[TABLE]

These formulas further simplify (42) to

[TABLE]

which is (40).

Identity (40) implies

[TABLE]

and application of the Cauchy-Schwarz inequality then gives

[TABLE]

This ends the proof of Lemma 5.1.

5.2 Proof of Lemma 5.2

First note that $\frac{d^{2}f_{t,\epsilon}}{dR^{2}}\leq 0$ . Then, using (38), $dR/d\epsilon\geq 1$ , $R(t,\epsilon)\geq\epsilon$ , and the Nishimori identity $\mathbb{E}\langle x_{i}^{2}\rangle=\mathbb{E}[X_{i}^{2}]=1$ ,

[TABLE]

From (37) $df_{t,\epsilon}/dR\in[-1/2,0]$ , therefore $[df_{t,\epsilon}/dR]_{\epsilon=s_{n}}^{\epsilon=2s_{n}}\geq-1/2$ . Integrating over $\epsilon$ then gives

[TABLE]

5.3 Proof of Lemma 5.3

Lemma 5.3 is based on the concentration of the free energy, a very general fact in "well behaved" statistical mechanics models. The proof of the following lemma uses more or less standard methods and can found in Appendix D.

Lemma 5.4 (Free energy fluctuations).

There exists a sequence $C_{n}(r,\lambda_{n})>0$ converging to a constant when $n\to+\infty$ , such that

[TABLE]

Recall $R=R(t,\epsilon)$ . Let

[TABLE]

From (39) we see that $\tilde{f}_{t,\epsilon}(R)$ is concave in $R$ . Furthermore, from (36) and $|x_{i}|\leq\sqrt{\frac{1-r}{r}}$ for $0\leq r\leq 1/2$ , we see that $\tilde{F}_{t,\epsilon}(R)$ is also concave in $R$ . So that we can employ the following lemma (see the end of this section for a proof):

Lemma 5.5 (A bound on the difference of derivatives due to concavity).

Let $G(x)$ and $g(x)$ be concave functions. Let $\delta>0$ and define $C^{+}_{\delta}(x)\equiv g^{\prime}(x)-g^{\prime}(x+\delta)\geq 0$ and $C^{-}_{\delta}(x)\equiv g^{\prime}(x-\delta)-g^{\prime}(x)\geq 0$ . Then

[TABLE]

From (44) we have

[TABLE]

and from (35) and (37) we have

[TABLE]

Using Lemma 5.5 we then get

[TABLE]

where $C_{\delta}^{+}(R)\equiv\tilde{f}_{t,\epsilon}^{\prime}(R)-\tilde{f}_{t,\epsilon}^{\prime}(R+\delta)\geq 0$ and $C_{\delta}^{-}(R)\equiv\tilde{f}^{\prime}_{t,\epsilon}(R-\delta)-\tilde{f}^{\prime}_{t,\epsilon}(R)\geq 0$ . Then squaring this inequality, using $(\sum_{i=1}^{p}v_{i})^{2}\leq p\sum_{i=1}^{p}v_{i}^{2}$ , taking the expectation, and recalling that $R=R(t,\epsilon)\geq\epsilon$ we reach

[TABLE]

Note that $\mathbb{E}[A_{n}^{2}]=a/n$ with $a=1-2/\pi$ . Recall $q^{*}(t,\epsilon)\in[0,\lambda_{n}]$ from Lemma 3.1. We can upper bound $u$ by $\lambda_{n}+2s_{n}+\delta$ . These remarks with Lemma 5.4 simplify (45) to

[TABLE]

Recall (37) and that $\mathbb{E}[\langle x_{i}\rangle^{2}]\leq\mathbb{E}\langle x_{i}^{2}\rangle=\mathbb{E}[X_{i}^{2}]=1$ . We have

[TABLE]

and therefore $0\leq C_{\delta}^{\pm}(R)\leq 1+\sqrt{\frac{1-r}{r(R-\delta)}}$ . Using $dR/d\epsilon\geq 1$ and $R\geq s_{n}$ we then have

[TABLE]

using the mean value theorem for the last step. Therefore upon integrating (46) over $\epsilon\in(s_{n},2s_{n})$ we have

[TABLE]

The bound is optimized choosing $\delta=(s_{n}^{2}/n)^{1/3}$ . This ends the proof.

Proof of Lemma 5.5.

Concavity implies that for any $\delta>0$ we have

[TABLE]

Combining these two inequalities ends the proof. ∎

6 Approximate integration by parts: proof of lemma 4.2

The following general formula follows from Taylor expansion with Lagrange remainder. When the r.h.s is small in specific applications, the formula can be seen as an approximate integration by parts formula generalizing Gaussian integration by parts.

Lemma 6.1.

Let $g(U)$ be a $\mathcal{C}^{4}$ function of a random variable $U$ such that for $k=1,2,3,4$ we have $\sup_{U}\big{|}g^{(k)}(U)\big{|}\leq C_{k}$ for some constants $C_{k}\geq 0$ and $g^{(k)}(U)\equiv d^{k}g(U)/dU^{k}$ . Suppose that the first four moments of $U$ are finite. Then

[TABLE]

Proof.

By Taylor’s theorem any $\mathcal{C}^{4}$ function $h(U)$ can be written as

[TABLE]

Taking the expectation on both sides:

[TABLE]

When (49) is applied to $h(U)=g^{(1)}(U)$ we have

[TABLE]

On the other hand when (49) is applied to $h(U)=Ug(U)$ , using $(Ug(U))^{(k)}=Ug^{(k)}(U)+kg^{(k-1)}(U)$ we have

[TABLE]

Subtracting (50) and (51) we have the bound

[TABLE]

which is the right hand side of (48) after factorization. ∎

We now apply lemma 6.1 to our specific problem in order to derive the approximate integration by parts formula (28).

Proof of lemma 4.2. In order to apply lemma 6.1 to the SBM, consider $U=G_{ij}$ and $g(U)=F_{t,\epsilon}(G_{ij})$ the free energy (11) seen as a function of $G_{ij}$ (all other variables being fixed). For the expectation we take $\mathbb{E}=\mathbb{E}_{G_{ij}|{X}_{i},{X}_{j}}$ . At time $t$ and for any integer $k$

[TABLE]

because $G_{ij}\in\{0,1\}$ . For the derivatives we note that using the Taylor expansion of the logarithm, one obtains for any $v_{n}\in\mathbb{R}$ and $v_{n}\rightarrow 0$ , $\ln(1+v_{n})-v_{n}=\mathcal{O}(|v_{n}|^{2})$ , which also implies $\ln(1+v_{n})=\mathcal{O}(|v_{n}|)$ . (The reader should keep this fact in mind, as it is used again in the appendices whenever we need to expand the logarithm.) Now this fact implies

[TABLE]

To obtain these identities the reader has again to be careful in performing the derivatives: both the exponential of the Hamiltonian and the partition function appearing in the definition of the Gibbs-bracket depend on $(G_{ij})$ (see the derivation of (21) for similar computations). In general,

[TABLE]

Using Lemma 6.1 we have

[TABLE]

Then by the triangle inequality we extract

[TABLE]

and recognize formula (28).

Appendix A Mutual information and free energy: proof of Proposition 2.3

Using (3), we have the expression

[TABLE]

We divide both the numerator and denominator by the same factor, and then rewrite the denominator in exponential form:

[TABLE]

Recall $\mathbb{E}_{G_{ij}|X_{i},X_{j}}G_{ij}=\bar{p}_{n}+\Delta_{n}X_{i}X_{j}$ . The first term in (53) equals

[TABLE]

Let $X\sim\mathbb{P}_{r}$ . We can further write explicitly the expectation in (54) that leads us to conclude

[TABLE]

Using the Taylor expansion of the logarithm, (55) becomes

[TABLE]

where $\mathbb{E}[X^{k}]^{2}=r^{2}(\frac{1-r}{r})^{k}+(1-r)^{2}(\frac{r}{1-r})^{k}+(-1)^{k}2r(1-r)$ . This becomes the expression in (5) by noting that the last term is $\mathcal{O}\Big{(}n\Delta_{n}^{3}/\big{(}\bar{p}_{n}(1-\bar{p}_{n})\big{)}^{2}\Big{)}=\mathcal{O}(\lambda_{n}^{3/2}/\sqrt{n\bar{p}_{n}(1-\bar{p}_{n})})$ .

Appendix B Liouville formula

Consider the differential equation (19) with $G_{n}(t,R(t,\epsilon))=\lambda_{n}\mathbb{E}\langle Q\rangle_{t,\epsilon}$ . Differentiating w.r.t $\epsilon$ and using the chain rule gives

[TABLE]

Therefore we have

[TABLE]

Integrating (56) over $t\in[0,t^{\prime}]$ we have

[TABLE]

Using $R(0,\epsilon)=\epsilon$ , (57) implies

[TABLE]

This is known as Liouville’s formula for one-dimensional ordinary differential equations.

Appendix C Small error terms in the sum rule: proof of (31)

Recalling the definitions (9) and (10), let

[TABLE]

Also let $F_{t,\epsilon;\sim G_{ij}}\equiv n^{-1}\ln\sum_{{\bm{x}}\in\mathcal{X}^{n}}e^{-\mathcal{H}_{t,\epsilon}({\bm{x}};{\bm{G}\setminus G_{ij}},{\bm{Y}})}\mathbb{P}_{r}(\bm{x})$ , and $\langle-\rangle_{t,\epsilon;\sim G_{ij}}$ be the Gibbs-bracket associated to the measure proportional to $\mathcal{H}_{t,\epsilon}(\bm{x};\bm{G}\setminus G_{ij},\bm{Y})$ . The difference of free energy when changing one $G_{ij}$ can be written in terms of this Gibbs-bracket:

[TABLE]

Using the Taylor expansion of the logarithms in (61), we have

[TABLE]

Therefore, replacing in the expression of $E_{1}$ , we find

[TABLE]

where

[TABLE]

We then observe that

[TABLE]

The difference between the Gibbs-brackets in (63) can be expanded as

[TABLE]

and we can evaluate $\langle x_{i}x_{j}\rangle_{t,\epsilon;G_{ij=1}}-\langle x_{i}x_{j}\rangle_{t,\epsilon;\sim G_{ij}}$ by an interpolation:

[TABLE]

where $\langle-\rangle_{t,\epsilon;s}$ is the Gibbs-bracket associated to the measure proportional to

[TABLE]

with $\mathcal{H}_{t,\epsilon}({\bm{x}};{\bm{G}\setminus G_{ij}},{\bm{Y}})$ defined in (59). By the Taylor expansion of the logarithms in (65) and using $\mathbb{P}_{t}(G_{ij}=1|X_{i},X_{j})=\mathcal{O}(\bar{p}_{n})$ , we see that the first term of (64) is $\mathcal{O}(\Delta_{n})$ . The same kind of calculation is used to see that the second term of (64) is also $\mathcal{O}(\Delta_{n})$ . This implies for (63)

[TABLE]

which tends to zero. Now we conclude by noting that $E_{1}+E_{2}=E_{1}^{(a)}+E_{1}^{(b)}+E_{2}$ and using (62) and (66) to obtain (31).

Appendix D Concentration of free energy: proof of Lemma 5.4

The generation of quenched variables can be divided into two stages: firstly $\bm{X}$ , then $\bm{G}$ given $\bm{X}$ , and independently the Gaussian noise $\bm{Z}$ . We expand the variance of free energy according to the two stages (recall $f_{t,\epsilon}=\mathbb{E}_{\bm{X}}\mathbb{E}_{\bm{G}|\bm{X}}\mathbb{E}_{\bm{Z}}F_{t,\epsilon}$ ):

[TABLE]

In each stage the variables are all independently generated. This enables us to use Efron-Stein inequality to show the concentration of free energy.

Let $\bm{Z}^{(i)}$ be a vector such that $\bm{Z}^{(i)}$ differs from $\bm{Z}$ only at the $i$ -th which becomes $Z_{i}^{\prime}$ drawn independently from the same distribution as the one of $Z_{i}\sim{\cal N}(0,1)$ . We define $\bm{G}^{(ij)}$ and $\bm{X}^{(i)}$ in the similar manner with respect to $\bm{G}$ and $\bm{X}$ . Efron-Stein’s inequality tells us that

[TABLE]

as well as

[TABLE]

By (67) it suffices to show that both (68) and (69) are upper bounded by $C_{n}(r,\lambda_{n})/n$ for some large enough sequence $C_{n}(r,\lambda_{n})$ that converges to a constant.

D.1 Bound on (68)

The bound obtained from Efron-Stein’s inequality is a sum of local variances of the free energy. The bound on the difference due to a local change can be estimated by interpolation. For the first one we have

[TABLE]

where the Gibbs-bracket $\langle-\rangle_{s}$ is associated to the measure proportional to $\exp\{-s\mathcal{H}_{t,\epsilon}(\bm{x};\bm{G},\bm{X},\bm{Z})-(1-s)\mathcal{H}_{t,\epsilon}(\bm{x};\bm{G},\bm{X},\bm{Z}^{(i)})\}$ . This implies an upper bound on the first sum in (68):

[TABLE]

Another interpolation gives

[TABLE]

for some constant $C(r)$ , and where $\langle-\rangle_{s}$ is associated to the measure proportional to $\exp\{-s\mathcal{H}_{t,\epsilon}(\bm{x};\bm{G},\bm{X},\bm{Z})-(1-s)\mathcal{H}_{t,\epsilon}(\bm{x};\bm{G}^{(ij)},\bm{X},\bm{Z})\}$ . This bounds the second sum in (68) as

[TABLE]

using that $(G_{ij})$ are [math], $1$ Bernoulli variables, and the variance

[TABLE]

as well as $\big{(}\Delta_{n}/\big{(}p_{n}(1-\bar{p}_{n})\big{)}\big{)}^{2}=\lambda_{n}/(n\bar{p}_{n}(1-\bar{p}_{n}))$ in the last inequality.

D.2 Bound on (69)

We relax (69) with inequality $((a-c)+(c-b))^{2}\leq 2(a-c)^{2}+2(c-b)^{2}$ so that

[TABLE]

The difference in the first sum is given by

[TABLE]

where $\langle-\rangle_{s}$ is associated to the measure proportional to $\exp\{-s\mathcal{H}_{t,\epsilon}(\bm{G},\bm{X},\bm{Z},\bm{x})-(1-s)\mathcal{H}_{t,\epsilon}(\bm{G},\bm{X}^{(i)},\bm{Z},\bm{x})\}$ . Therefore the sum of square is bounded by $C_{n}(r,\lambda_{n})/n$ using $R(t,\epsilon)\in[0,\lambda_{n}]$ .

For the second sum we use another interpolation:

[TABLE]

where

[TABLE]

As $G_{ij}\in\{0,1\}$ , we have various ways to write $\mathbb{P}_{t,s}(\bm{G}|\bm{X},X^{\prime}_{i})$ . A convenient way is using

[TABLE]

A compact formula for $dP_{ij}/ds$ can then be derived:

[TABLE]

Let $\bm{G}_{\sim(i,j)}\equiv\bm{G}\setminus G_{ij}$ and $\mathbb{P}_{t,s}(\bm{G}_{\sim(i,j)}|\bm{X},X^{\prime}_{i})\equiv\sum_{G_{ij}\in\{0,1\}}\mathbb{P}_{t,s}(\bm{G}|\bm{X},X^{\prime}_{i})$ be the marginal of this sub-graph. Using (72) we obtain

[TABLE]

Substituting (73) into (71) gives

[TABLE]

where $\mathbb{E}_{\bm{G}_{\sim(i,j)}|\bm{X},X^{\prime}_{i}}$ corresponds to the expectation with respect to the distribution $\mathbb{P}_{t,s}(\bm{G}_{\sim(i,j)}|\bm{X},X^{\prime}_{i})$ . To evaluate the difference of free energy in (74), first we define $\bm{Y}^{(i)}=\sqrt{R(t,\epsilon)}\bm{X}^{(i)}+\bm{Z}$ , and $\langle-\rangle_{t,\epsilon;\bm{X}^{(i)},\sim G_{ij}}$ is associated to $\exp\{-\mathcal{H}_{t,\epsilon}(\bm{x};\bm{G}\setminus G_{ij},\bm{Y}^{(i)})\}$ defined in (59). The same calculation as in (60) – (61) gives

[TABLE]

Expanding the logarithms we can see (75) is $\mathcal{O}\big{(}\Delta_{n}/(n\bar{p}_{n}(1-\bar{p}_{n}))\big{)}$ . Using this fact and that all other terms inside the sum of (74) are upper bounded by constants, we see that (74) is $\mathcal{O}\big{(}\Delta_{n}^{2}/(\bar{p}_{n}(1-\bar{p}_{n})\big{)}=\mathcal{O}(\lambda_{n}/n)$ . We can then upper bound the second term of (70):

[TABLE]

Acknowledgments

This work was supported by the SNSF grant no. 200021-156672.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels: First steps,” Social Networks , vol. 5, no. 2, pp. 109–137, 1983.
2[2] T. N. Bui, S. Chaudhuri, F. T. Leighton, and M. Sipser, “Graph bisection algorithms with good average case behavior,” in 25th Annual Symposium FOCS , 1984, pp. 181–192.
3[3] B. Söderberg, “General formalism for inhomogeneous random graphs,” Phys. Rev. E , vol. 66, p. 066121, 2002.
4[4] B. Bollobás, S. Janson, and O. Riordan, “The phase transition in inhomogeneous random graphs,” Random Struct. Algorithms , 2007.
5[5] S. Fortunato, “Community detection in graphs,” Physics Reports , vol. 486, no. 3, pp. 75 – 174, 2010.
6[6] E. Abbe, “Community detection and stochastic block models: Recent developments,” Journal of Machine Learning Research , vol. 18, 2018.
7[7] J. Barbier and N. Macris, “The adaptive interpolation method: a simple scheme to prove replica formulas in bayesian inference,” Probability Theory and Related Fields , Oct 2018.
8[8] J. Barbier and N. Macris, “The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models,” Journal of Physics A: Mathematical and General , vol. J Phys A-111295.R 1, 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Mutual information for the stochastic block model

Abstract

1 Introduction

2 Setting and results: asymmetric two-groups SBM

Theorem 2.1** (Upper bound).**

Theorem 2.2** (Lower bound).**

Proposition 2.3** (Linking the mutal information and log-partition function).**

3 Adaptive path interpolation

Proof.

3.1 The upper bound: proof of Theorem 2.1

3.2 The lower bound: proof of Theorem 2.2

Lemma 3.1**.**

Proof.

Lemma 3.2** (Overlap concentration).**

4 The fundamental sum rule: proof of (15)

4.1 Term D1D_{1}D1​.

Lemma 4.1**.**

Proof.

Lemma 4.2**.**

4.2 Term D2D_{2}D2​.

Lemma 4.3**.**

Proof.

4.3 Term D3D_{3}D3​.

Lemma 4.4**.**

Proof.

4.4 Final derivations of the sum rule.

Lemma 4.5**.**

Proof.

5 Concentration of overlap: proof of Lemma 3.2

Lemma 5.1** (A fluctuation identity).**

Lemma 5.2** (Thermal fluctuations).**

Lemma 5.3** (Quenched fluctuations).**

5.1 Proof of Lemma 5.1

5.2 Proof of Lemma 5.2

5.3 Proof of Lemma 5.3

Lemma 5.4** (Free energy fluctuations).**

Lemma 5.5** (A bound on the difference of derivatives due to concavity).**

Proof of Lemma 5.5.

6 Approximate integration by parts: proof of lemma 4.2

Lemma 6.1**.**

Proof.

Appendix A Mutual information and free energy: proof of Proposition 2.3

Appendix B Liouville formula

Appendix C Small error terms in the sum rule: proof of (31)

Appendix D Concentration of free energy: proof of Lemma 5.4

D.1 Bound on (68)

D.2 Bound on (69)

Acknowledgments

Theorem 2.1 (Upper bound).

Theorem 2.2 (Lower bound).

Proposition 2.3 (Linking the mutal information and log-partition function).

Lemma 3.1.

Lemma 3.2 (Overlap concentration).

4.1 Term $D_{1}$ .

Lemma 4.1.

Lemma 4.2.

4.2 Term $D_{2}$ .

Lemma 4.3.

4.3 Term $D_{3}$ .

Lemma 4.4.

Lemma 4.5.

Lemma 5.1 (A fluctuation identity).

Lemma 5.2 (Thermal fluctuations).

Lemma 5.3 (Quenched fluctuations).

Lemma 5.4 (Free energy fluctuations).

Lemma 5.5 (A bound on the difference of derivatives due to concavity).

Lemma 6.1.