Charting the replica symmetric phase

Amin Coja-Oghlan; Charilaos Efthymiou; Nor Jaafari; Mihyun Kang,; Tobias Kapetanopoulos

arXiv:1704.01043·cs.DM·March 14, 2018

Charting the replica symmetric phase

Amin Coja-Oghlan, Charilaos Efthymiou, Nor Jaafari, Mihyun Kang,, Tobias Kapetanopoulos

PDF

TL;DR

This paper rigorously confirms the physicists' predictions about the replica symmetric phase in diluted mean-field models, including models like Potts antiferromagnet, k-XORSAT, and stochastic block models, clarifying phase transitions and detection thresholds.

Contribution

It provides a rigorous mathematical validation of the replica symmetric phase and phase transition predictions for a broad class of diluted mean-field models, previously based on non-rigorous methods.

Findings

01

Confirmed the existence of a replica symmetry breaking phase transition.

02

Validated the detailed evolution of the Gibbs measure within the replica symmetric phase.

03

Proved a conjecture on the detection problem in the stochastic block model.

Abstract

Diluted mean-field models are spin systems whose geometry of interactions is induced by a sparse random graph or hypergraph. Such models play an eminent role in the statistical mechanics of disordered systems as well as in combinatorics and computer science. In a path-breaking paper based on the non-rigorous `cavity method', physicists predicted not only the existence of a replica symmetry breaking phase transition in such models but also sketched a detailed picture of the evolution of the Gibbs measure within the replica symmetric phase and its impact on important problems in combinatorics, computer science and physics [Krzakala et al.: PNAS 2007]. In this paper we rigorise this picture completely for a broad class of models, encompassing the Potts antiferromagnet on the random graph, the $k$ -XORSAT model and the diluted $k$ -spin model for even $k$ . We also prove a conjecture about the…

Equations866

μ_{H, J, β} (σ)

μ_{H, J, β} (σ)

B_{k - spin} (d, β, π) = \frac{1}{2} E

B_{k - spin} (d, β, π) = \frac{1}{2} E

- \frac{d}{k} E Λ 1 + σ_{1}, \dots, σ_{k} {\pm 1} \sum tanh (β I_{1} σ_{1} \dots σ_{k}) h = 1 \prod k ρ_{h}^{π} (σ_{h}) .

n \to \infty lim \frac{1}{n} E [ln Z_{β} (H, J)] {= ln 2 + \frac{d}{2 π k} \int_{- \infty}^{\infty} ln (cosh (z)) exp (- z^{2} /2) d z < ln 2 + \frac{d}{2 π k} \int_{- \infty}^{\infty} ln (cosh (z)) exp (- z^{2} /2) d z \mbox i f d \leq d_{cond} (k, β), \mbox i f d > d_{cond} (k, β) .

n \to \infty lim \frac{1}{n} E [ln Z_{β} (H, J)] {= ln 2 + \frac{d}{2 π k} \int_{- \infty}^{\infty} ln (cosh (z)) exp (- z^{2} /2) d z < ln 2 + \frac{d}{2 π k} \int_{- \infty}^{\infty} ln (cosh (z)) exp (- z^{2} /2) d z \mbox i f d \leq d_{cond} (k, β), \mbox i f d > d_{cond} (k, β) .

μ_{H, J, β, x, y} (s, t) = ⟨ 1 {σ_{1} (x) = s, σ_{1} (y) = t} ⟩_{H, J, β}

μ_{H, J, β, x, y} (s, t) = ⟨ 1 {σ_{1} (x) = s, σ_{1} (y) = t} ⟩_{H, J, β}

n \to \infty lim E ⟨ ϱ_{σ_{1}, σ_{2}}^{2} ⟩_{H, J, β} = 0 iff n \to \infty lim \frac{1}{n ^{2}} x, y \in V_{n} \sum E ∥ μ_{H, J, β, x, y} - \overset{ρ}{ˉ} ∥_{TV} = 0.

n \to \infty lim E ⟨ ϱ_{σ_{1}, σ_{2}}^{2} ⟩_{H, J, β} = 0 iff n \to \infty lim \frac{1}{n ^{2}} x, y \in V_{n} \sum E ∥ μ_{H, J, β, x, y} - \overset{ρ}{ˉ} ∥_{TV} = 0.

μ_{G, q, β} (σ)

μ_{G, q, β} (σ)

B_{Potts} (q, β, d)

B_{Potts} (q, β, d)

d_{cond} (q, β)

n \to \infty lim \frac{1}{n} E [ln Z_{q, β} (G)]

n \to \infty lim \frac{1}{n} E [ln Z_{q, β} (G)]

d_{cond} (q, β)

d_{cond} (q, β)

K = l = 3 \sum \infty K_{l} ln (1 + δ_{l}) - \frac{d ^{l} δ _{l}}{2 l} \mbox w h er e δ_{l} = (q - 1) (\frac{e ^{- β} - 1}{q - 1 + e ^{- β}})^{l} .

K = l = 3 \sum \infty K_{l} ln (1 + δ_{l}) - \frac{d ^{l} δ _{l}}{2 l} \mbox w h er e δ_{l} = (q - 1) (\frac{e ^{- β} - 1}{q - 1 + e ^{- β}})^{l} .

\ln Z_{q,\beta}(\mathbb{G})-\left({n+\frac{1}{2}}\right)\ln q-|E(\mathbb{G})|\ln\left({1-\frac{1-\mathrm{e}^{-\beta}}{q}}\right)+\frac{q-1}{2}\ln\left({1+\frac{d(1-\mathrm{e}^{-\beta})}{q-1+\mathrm{e}^{-\beta}}}\right)+\frac{d\delta_{1}}{2}+\frac{d^{2}\delta_{2}}{4}\quad{\stackrel{{\scriptstyle\mbox{\scriptsize$n\to\infty$}}}{{\to}}}\quad\mathcal{K}.

\ln Z_{q,\beta}(\mathbb{G})-\left({n+\frac{1}{2}}\right)\ln q-|E(\mathbb{G})|\ln\left({1-\frac{1-\mathrm{e}^{-\beta}}{q}}\right)+\frac{q-1}{2}\ln\left({1+\frac{d(1-\mathrm{e}^{-\beta})}{q-1+\mathrm{e}^{-\beta}}}\right)+\frac{d\delta_{1}}{2}+\frac{d^{2}\delta_{2}}{4}\quad{\stackrel{{\scriptstyle\mbox{\scriptsize$n\to\infty$}}}{{\to}}}\quad\mathcal{K}.

corr_{q, β} (d)

corr_{q, β} (d)

corr_{q, β}^{⋆} (d)

corr_{q, β}^{⋆} (d)

d_{in}

d_{in}

n \to \infty lim P [G \in A_{n}] = 0 iff n \to \infty lim P [G^{*} \in A_{n}] = 0.

n \to \infty lim P [G \in A_{n}] = 0 iff n \to \infty lim P [G^{*} \in A_{n}] = 0.

ψ_{G} (σ)

ψ_{G} (σ)

q = ∣Ω∣ \mbox an d ξ = q^{- k} σ \in Ω^{k} \sum E [ψ (σ)] .

q = ∣Ω∣ \mbox an d ξ = q^{- k} σ \in Ω^{k} \sum E [ψ (σ)] .

E [ln^{8} (1 - max {∣1 - ψ (τ) ∣ : τ \in Ω^{k}})]

E [ln^{8} (1 - max {∣1 - ψ (τ) ∣ : τ \in Ω^{k}})]

τ \in Ω^{k} \sum 1 {τ_{i} = ω} ψ (τ) = q^{k - 1} ξ

τ \in Ω^{k} \sum 1 {τ_{i} = ω} ψ (τ) = q^{k - 1} ξ

ϕ : μ \in P (Ω) \mapsto τ \in Ω^{k} \sum E [ψ (τ)] i = 1 \prod k μ (τ_{i})

ϕ : μ \in P (Ω) \mapsto τ \in Ω^{k} \sum E [ψ (τ)] i = 1 \prod k μ (τ_{i})

ρ \in R (Ω) \mapsto σ, τ \in Ω^{k} \sum E [ψ (σ) ψ (τ)] i = 1 \prod k ρ (σ_{i}, τ_{i})

ρ \in R (Ω) \mapsto σ, τ \in Ω^{k} \sum E [ψ (σ) ψ (τ)] i = 1 \prod k ρ (σ_{i}, τ_{i})

E [Λ (τ \in Ω^{k} \sum ψ (τ) i = 1 \prod k ρ_{i} (τ_{i})) + (k - 1) Λ (τ \in Ω^{k} \sum ψ (τ) i = 1 \prod k ρ_{i}^{'} (τ_{i})) - k Λ (τ \in Ω^{k} \sum ψ (τ) ρ_{1} (τ_{1}) i = 2 \prod k ρ_{i}^{'} (τ_{i}))] \geq 0.

E [Λ (τ \in Ω^{k} \sum ψ (τ) i = 1 \prod k ρ_{i} (τ_{i})) + (k - 1) Λ (τ \in Ω^{k} \sum ψ (τ) i = 1 \prod k ρ_{i}^{'} (τ_{i})) - k Λ (τ \in Ω^{k} \sum ψ (τ) ρ_{1} (τ_{1}) i = 2 \prod k ρ_{i}^{'} (τ_{i}))] \geq 0.

B (d, P, π)

B (d, P, π)

d_{cond}

n \to \infty lim \frac{1}{n} E [ln Z (G)]

n \to \infty lim \frac{1}{n} E [ln Z (G)]

n \to \infty lim sup \frac{1}{n} E [ln Z (G)]

Φ_{ψ} (ω, ω^{'}) = q^{1 - k} ξ^{- 1} τ \in Ω^{k} \sum 1 {τ_{1} = ω, τ_{2} = ω^{'}} ψ (τ) (ω, ω^{'} \in Ω)

Φ_{ψ} (ω, ω^{'}) = q^{1 - k} ξ^{- 1} τ \in Ω^{k} \sum 1 {τ_{1} = ω, τ_{2} = ω^{'}} ψ (τ) (ω, ω^{'} \in Ω)

Ξ

Ξ

E = {z \in R^{q} \otimes R^{q} : \forall y \in R^{q} : ⟨ z, 1 \otimes y ⟩ = ⟨ z, y \otimes 1 ⟩ = 0} .

E = {z \in R^{q} \otimes R^{q} : \forall y \in R^{q} : ⟨ z, 1 \otimes y ⟩ = ⟨ z, y \otimes 1 ⟩ = 0} .

d_{KS} = ((k - 1) x \in E : ∥ x ∥ = 1 max ⟨ Ξ x, x ⟩)^{- 1},

d_{KS} = ((k - 1) x \in E : ∥ x ∥ = 1 max ⟨ Ξ x, x ⟩)^{- 1},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Charting the replica symmetric phase

Amin Coja-Oghlan*∗, Charilaos Efthymiou∗∗, Nor Jaafari, Mihyun Kang∗∗∗, Tobias Kapetanopoulos∗∗∗∗*

Amin Coja-Oghlan, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.

Charilaos Efthymiou, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.

Nor Jaafari, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.

Mihyun Kang, [email protected], Technische Universität Graz, Institute of Discrete Mathematics, Steyrergasse 30, 8010 Graz, Austria

Tobias Kapetanopoulos, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.

Abstract.

Diluted mean-field models are spin systems whose geometry of interactions is induced by a sparse random graph or hypergraph. Such models play an eminent role in the statistical mechanics of disordered systems as well as in combinatorics and computer science. In a path-breaking paper based on the non-rigorous ‘cavity method’, physicists predicted not only the existence of a replica symmetry breaking phase transition in such models but also sketched a detailed picture of the evolution of the Gibbs measure within the replica symmetric phase and its impact on important problems in combinatorics, computer science and physics [Krzakala et al.: PNAS 2007]. In this paper we rigorise this picture completely for a broad class of models, encompassing the Potts antiferromagnet on the random graph, the $k$ -XORSAT model and the diluted $k$ -spin model for even $k$ . We also prove a conjecture about the detection problem in the stochastic block model that has received considerable attention [Decelle et al.: Phys. Rev. E 2011].

*∗*The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Grant Agreement n. 278857–PTCC

∗∗ Supported by DFG grant EF 103/1-1

*∗∗∗*Supported by Austrian Science Fund (FWF): P26826.

*∗∗∗∗*Supported by Stiftung Polytechnische Gesellschaft PhD grant

1. Introduction

1.1. The cavity method

Contrasting the awe-inspiring arsenal of techniques at the disposal of modern combinatorics and probability with the utter simplicity of terms in which, say, the Erdős-Rényi random graph model is defined, one might expect that after a half-century of study everything ought to be known about this and alike models. Yet beneath the surface lurks a picture of mesmerizing complexity. Its unexpected intricacy was brought out most clearly by a line of research that commenced in the statistical physics community with the study of diluted mean-field models, spin systems whose geometry of interactions is induced by a sparse random graph or hypergraph. Such models were put forward in physics as models of disordered systems [47]. Prominent examples include the diluted $k$ -spin model or the Potts antiferromagnet on a random graph [25, 37, 60]. The graph structure, convergent locally to the Bethe lattice or a Galton-Watson tree, induces a non-trivial metric, which is why such models have been argued to evince a closer semblance of physical reality than fully connected ones such as the Sherrington-Kirkpatrick model [48, 50]. But perhaps even more importantly, apart from and beyond the disordered systems thread, in the course of the past half-century models based on random graphs have come to play a role in combinatorics, probability, statistics and computer science that can hardly be overstated. For example, the random $k$ -SAT model is of fundamental interest in computer science [9], the stochastic block model has gained prominence in statistics [1, 38, 56], low-density parity check codes are the bread and butter of modern coding theory [63], and problems such as random graph coloring have been the lodestars of probabilistic combinatorics ever since the days of Erdős and Rényi [9, 21, 29].

In the course of the past 20 years physicists developed an analytic but non-rigorous technique for the study of such models called the ‘cavity method’. It has been brought to bear on all of the aforementioned and very many other models in an impressive and ongoing line of work that has led to numerous predictions that impact on an astounding variety of problems (e.g., [26, 47, 51, 67]). The task of putting the cavity method on a rigorous foundation has therefore gained substantial importance, and despite recent successes (e.g., [23, 28, 35, 56]) much remains to be done. In particular, while the cavity method can be applied to a given model almost mechanically, most rigorous arguments are still based on ad hoc, model-specific delibarations. This leads to the question of whether we can come up with abstract arguments that rigorise the cavity method wholesale, which is the thrust of the present paper.

One of the most important predictions of the cavity method is that the Gibbs measures induced by random graph models undergo a replica symmetry breaking or condensation phase transition [43]. Physically this phase transition resembles the Kauzmann transition from the study of glasses [40]. The fact that a phase transition occurs at the location predicted by the cavity method was recently proved for a fairly broad family of models [23]. However, that result fell short of establishing that the condensation phase transition does indeed mark the point where the nature of correlations under the Gibbs measure changes as predicted by the cavity method.

Here we prove that this is indeed the case. In fact, we rigorise the entire “map” of the replica symmetric phase as predicted in [32, 43, 44], including its boundary, the evolution of the nature of correlations within and an important contiguity result. More specifically, first and foremost we prove that the condensation phase transition does indeed separate a “replica symmetric” phase without extensive long-range correlations from a phase where long-range correlations prevail, arguably the key feature of the physics picture. Further, we verify the physics prediction on the threshold for the onset of point-to-set correlations, called the reconstruction threshold. Additionally, we derive the precise limiting distribution of the free energy within the replica symmetric phase, thereby vindicating a prediction that the free energy exhibits remarkably small fluctuations [32, 44]. Finally, verifying a prominent prediction from [26], we prove a contiguity statement that has an impact on statistical inference problems such as the stochastic block model.

The results of this paper cover a wide class of random graph models, even broader than the family of models for which the condensation threshold was previously derived in [23]. Indeed, as a testimony to the power of the present general approach we may point out that even the specializations of the main results to prominent examples such as the Potts antiferromagnet on the random graph or the $k$ -spin model were not previously known, even though these models received considerable attention in their own right. Before presenting the general results in Section 2, we illustrate their impact on three important examples: the diluted $k$ -spin model, the Potts antiferromagnet on the random graph and the stochastic block model.

1.2. The diluted $k$ -spin model

For integers $k\geq 2$ , $n\geq 1$ and a real $p\in[0,1]$ let $\mathbb{H}=\mathbb{H}_{k}(n,p)$ be the random $k$ -uniform hypergraph on $V_{n}=\{x_{1},\ldots,x_{n}\}$ whose edge set $E(\mathbb{H})$ is obtained by including each of the ${\binom{n}{k}}$ possible $k$ -subsets of $V_{n}$ with probability $p$ independently. Additionally, let $\boldsymbol{J}=(\boldsymbol{J}_{e})_{e\in E(\mathbb{H})}$ be a family of independent standard Gaussians. The $k$ -spin model on $\mathbb{H}$ at inverse temperature $\beta>0$ is the distribution on the set $\{-1,1\}^{V_{n}}$ defined by

[TABLE]

Arguably the most interesting and at the same time most challenging scenario arises in the case of a sparse random hypergraph [48]. Specifically, set $p=d/{\binom{n-1}{k-1}}$ for a fixed $d>0$ so that in the limit $n\to\infty$ the average vertex degree of $\mathbb{H}$ converges to $d$ in probability. How does the model change as we vary $d$ ?

According to the physics predictions for any $k$ , $\beta$ there exists a condensation threshold $d_{\mathrm{cond}}(k,\beta)$ where the function $d\mapsto\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\ln Z_{\beta}(\mathbb{H},\boldsymbol{J})]$ is non-analytic [33]. This conjecture was proved in the case $k=2$ by Guerra and Toninelli [37]. However, their technique does not give the precise condensation phase transition for $k>2$ [37, Section 9], nor does he $k$ -spin model belong to the class of models for which the condensation threshold was determined in [23]. The following theorem pinpoints the precise condensation threshold for all $k\geq 3$ , proving the prediction from [33].

As is the case of most results inspired by the cavity method, the precise value $d_{\mathrm{cond}}(k,\beta)$ comes in terms of a stochastic optimization problem. Specifically, write $\mathcal{P}(\mathcal{X})$ for the set of all probability distributions on a finite set $\mathcal{X}$ and identify $\mathcal{P}(\mathcal{X})$ with the standard simplex in $\mathbb{R}^{\mathcal{X}}$ . Moreover, let $\mathcal{P}^{2}(\Omega)$ be the space of all probability measures on $\mathcal{P}(\mathcal{X})$ and let $\mathcal{P}^{2}_{*}(\mathcal{X})$ be the space of all $\pi\in\mathcal{P}^{2}(\mathcal{X})$ whose barycenter $\int_{\mathcal{P}(\mathcal{X})}\mu{\mathrm{d}}\pi(\mu)$ is the uniform distribution on $\mathcal{X}$ . Further, let $\Lambda(x)=x\ln x$ .

Theorem 1.1.

Suppose that $d>0,\beta>0$ and that $k\geq 3$ . Let ${\boldsymbol{\gamma}}$ be a Poisson variable with mean $d$ , let $\boldsymbol{I}_{1},\boldsymbol{I}_{2},\ldots$ be standard Gaussians and for $\pi\in\mathcal{P}_{*}^{2}(\{\pm 1\})$ let ${\boldsymbol{\rho}}_{1}^{\pi},{\boldsymbol{\rho}}_{2}^{\pi},\ldots\in\mathcal{P}(\{\pm 1\})$ be random variables with distribution $\pi$ , all mutually independent. Define

[TABLE]

and $d_{\mathrm{cond}}(k,\beta)=\inf\{d>0:\sup_{\pi\in\mathcal{P}_{*}^{2}(\{1,-1\})}\mathcal{B}_{k-\mathrm{spin}}(d,\beta,\pi)>\ln 2\}.$ Then $0<d_{\mathrm{cond}}(k,\beta)<\infty$ and

[TABLE]

From now on we assume that $k\geq 4$ is even. The regime $d<d_{\mathrm{cond}}(k,\beta)$ is called the replica symmetric phase. According to the cavity method, its key feature is that with probability tending to $1$ in the limit $n\to\infty$ , two independent samples $\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}$ (‘replicas’) chosen from the Gibbs measure $\mu_{\mathbb{H},\boldsymbol{J},\beta}$ are “essentially perpendicular”. To formalize this define for $\sigma,\tau:V_{n}\to\{\pm 1\}$ the overlap as $\varrho_{\sigma,\tau}=\sum_{x\in V_{n}}\sigma(x)\tau(x)/n.$ We write $\left\langle{\,\cdot\,}\right\rangle_{\mathbb{H},\boldsymbol{J},\beta}$ for the average on $\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}$ chosen independently from $\mu_{\mathbb{H},\boldsymbol{J},\beta}$ and denote the expectation over the choice of $\mathbb{H}$ and $\boldsymbol{J}$ by $\mathbb{E}\left[{\,\cdot\,}\right]$ .

Theorem 1.2.

For all $\beta>0$ and $k\geq 4$ even we have $d_{\mathrm{cond}}(k,\beta)=\inf\left\{{d>0:\limsup_{n\to\infty}\mathbb{E}\left\langle{\varrho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}^{2}}\right\rangle_{\mathbb{H},\boldsymbol{J},\beta}>0}\right\}.$

The corresponding statement for $k=2$ was proved by Guerra and Toninelli, but as they point out their argument does not extend to larger $k$ [37].

Theorem 1.2 implies the absence of extensive long-range correlations in the replica symmetric phase. Indeed, for two vertices $x,y\in V_{n}$ and $s,t\in\{+1,-1\}$ let

[TABLE]

be the joint distribution of the spins assigned to $x,y$ . Further, let $\bar{\rho}$ be the uniform distribution on $\{\pm 1\}\times\{\pm 1\}$ . Then the total variation distance $\|\mu_{\mathbb{H},\boldsymbol{J},\beta,x,y}-\bar{\rho}\|_{\mathrm{TV}}$ is a measure of how correlated the spins of $x,y$ are. Indeed, in the case that $k$ is even for every $x\in V_{n}$ the Gibbs marginals satisfy $\mu_{\mathbb{H},\boldsymbol{J},\beta,x}(\pm 1)=\left\langle{\boldsymbol{1}\{\boldsymbol{\sigma}_{1}(x)=\pm 1\}}\right\rangle_{\mathbb{H},\boldsymbol{J},\beta}=1/2$ because $\mu_{\mathbb{H},\boldsymbol{J},\beta}(\sigma)=\mu_{\mathbb{H},\boldsymbol{J},\beta}(-\sigma)$ for every $\sigma\in\{-1,+1\}^{n}$ . Therefore, if the spins at $x,y$ were independent, then $\mu_{\mathbb{H},\boldsymbol{J},\beta,x,y}=\mu_{\mathbb{H},\boldsymbol{J},\beta,x}\otimes\mu_{\mathbb{H},\boldsymbol{J},\beta,y}=\bar{\rho}$ . Furthermore, it is well known (e.g., [13, Section 2]) that

[TABLE]

Thus, Theorem 1.2 implies that for $d<d_{\mathrm{cond}}(k,\beta)$ , with probability tending to $1$ , the spins assigned to two random vertices $x,y$ of $\mathbb{H}$ are asymptotically independent. By contrast, Theorem 1.2 and (1.2) show that extensive long-range dependencies occur beyond but arbitrarily close to $d_{\mathrm{cond}}(k,\beta)$ .

1.3. The Potts antiferromagnet

Let $q\geq 2$ be an integer, let $\Omega=\{1,\ldots,q\}$ be a set of $q$ “colors” and let $\beta>0$ . The antiferromagnetic $q$ -spin Potts model on a graph $G=(V(G),E(G))$ at inverse temperature $\beta$ is the probability distribution on $\Omega^{V_{n}}$ defined by

[TABLE]

The Potts model on the random graph $\mathbb{G}=\mathbb{G}(n,p)$ with vertex set $V_{n}=\{x_{1},\ldots,x_{n}\}$ whose edge set $E(\mathbb{G})$ is obtained by including each of the ${\binom{n}{2}}$ possible pairs $\{v,w\}$ , $v,w\in V_{n}$ , $v\neq w$ , with probability $p\in[0,1]$ independently, has received considerable attention (e.g. [12, 22, 25]). As in the $k$ -spin model, the most challenging case is that $p=d/n$ for a fixed real $d>0$ , so that the average degree converges to $d$ in probability.

The condensation phase transition in this model was pinpointed recently [23]. As in the $k$ -spin model, the answer comes as a stochastic optimization problem. To be precise, let ${\boldsymbol{\gamma}}$ be a ${\rm Po}(d)$ -random variable, let ${\boldsymbol{\rho}}_{1}^{\pi},{\boldsymbol{\rho}}_{2}^{\pi},\ldots$ denote samples from $\pi\in\mathcal{P}^{2}_{*}(\Omega)$ , mutually independent and independent of ${\boldsymbol{\gamma}}$ , and set

[TABLE]

Then [23, Theorem 1.1] shows that $0<d_{\mathrm{cond}}(q,\beta)<\infty$ and

[TABLE]

While it may be difficult to calculate $d_{\mathrm{cond}}(q,\beta)$ numerically, there is the explicit Kesten-Stigum bound [3]

[TABLE]

which is known to be tight for $q=2$ for all $\beta$ [45, 58, 59], conjectured to be tight for $q=3$ for all $\beta$ [26, 46], and known not to be tight for $q\geq 5$ [66].

What can we say about the nature of the Gibbs measure in the ‘replica symmetric phase’ $0<d<d_{\mathrm{cond}}(q,\beta)$ ? Azuma’s inequality shows that $\frac{1}{n}\ln Z_{q,\beta}(\mathbb{G})$ converges to $\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\ln Z_{q,\beta}(\mathbb{G})]$ in probability, i.e., the free energy $\ln Z_{q,\beta}(\mathbb{G})$ has fluctuations of order $o(n)$ . On the other hand, given that key parameters such as the size of the largest connected component of $\mathbb{G}$ exhibit fluctuations of order $\sqrt{n}$ even once we condition on the number $|E(\mathbb{G})|$ of edges, one might expect that so does $\ln Z_{q,\beta}(\mathbb{G})$ . Yet remarkably, the following theorem shows that throughout the replica symmetric phase the free energy merely has bounded fluctuations given $|E(\mathbb{G})|$ . In fact, we know the precise limiting distribution.

Theorem 1.3.

Let $q\geq 2$ , $\beta>0$ and $0<d<d_{\mathrm{cond}}(q,\beta)$ . With $(K_{l})_{l\geq 3}$ a sequence of independent Poisson variables with mean $\mathbb{E}[K_{l}]=d^{l}/(2l)$ , let

[TABLE]

Then $\mathbb{E}|\mathcal{K}|<\infty$ and, in distribution,

[TABLE]

Further, as in the $k$ -spin model the replica symmetric phase can be characterized in terms of the overlap. Formally, define the overlap of two colorings $\sigma,\tau:V_{n}\to\Omega$ as the probability distribution $\rho_{\sigma,\tau}=(\rho_{\sigma,\tau}(s,t))_{s,t\in\Omega}$ on $\Omega\times\Omega$ where $\rho_{\sigma,\tau}(s,t)=|\sigma^{-1}(s)\cap\tau^{-1}(t)|/n$ is the probability that a random vertex $v$ is colored $s$ under $\sigma$ and $t$ under $\tau$ . Let $\bar{\rho}$ denote the uniform distribution on $\Omega\times\Omega$ , write $\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}$ for two independent samples from $\mu_{\mathbb{G},q,\beta}$ , denote the expectation with respect to $\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}$ by $\left\langle{\,\cdot\,}\right\rangle_{\mathbb{G},q,\beta}$ and the expectation over the choice of $\mathbb{G}$ by $\mathbb{E}\left[{\,\cdot\,}\right]$ .

Theorem 1.4.

For all $q\geq 2,\beta>0$ we have $d_{\mathrm{cond}}(q,\beta)=\inf\left\{{d>0:\limsup_{n\to\infty}\mathbb{E}\left\langle{\|\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}\|_{\mathrm{TV}}}\right\rangle_{\mathbb{G},q,\beta}>0}\right\}.$

As in the case of the $k$ -spin model it is easy to see that $\mathbb{E}\langle\|\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}\|_{\mathrm{TV}}\rangle_{\mathbb{G},q,\beta}=o(1)$ iff the colors assigned to two randomly chosen vertices of $\mathbb{G}$ are asymptotically independent with probability tending to one. Hence, $d_{\mathrm{cond}}(q,\beta)$ marks the onset of long-range correlations.

In many diluted models, and in particular in the Potts antiferromagnet, the condensation transition is conjectured to be preceded by another threshold where certain “point-to-set correlations” emerge [43]. Intuitively, the reconstruction threshold is the point from where for a random vertex $y\in V_{n}$ correlations between the color assigned to $y$ and the colors assigned to all vertices at a large enough distance $\ell$ from $y$ persist. Formally, with $\boldsymbol{\sigma}$ chosen from $\mu_{\mathbb{G},q,\beta}$ let $\nabla_{\ell,q,\beta}(\mathbb{G},y)$ be the $\sigma$ -algebra on $\Omega^{V_{n}}$ generated by the random variables $\boldsymbol{\sigma}(z)$ , where $z$ ranges over all vertices at distance at least $\ell$ from $y$ . Then

[TABLE]

measures the extent of correlations between $y$ and a random boundary condition in the limit $\ell,n\to\infty$ (the outer limit exists due to mononicity). Indeed, with the expectation $\mathbb{E}\left[{\,\cdot\,}\right]$ in (1.8) referring to the choice of $\mathbb{G}$ , the outer $\left\langle{\,\cdot\,}\right\rangle_{\mathbb{G},q,\beta}$ chooses a random coloring of the vertices at distance at least $\ell$ from $y$ and the inner $\langle\,\cdot\,|\nabla_{\ell,q,\beta}(\mathbb{G},y)\rangle_{\mathbb{G},q,\beta}$ averages over the color of $y$ given the boundary condition.

The reconstruction threshold is defined as $d_{\mathrm{rec}}(q,\beta)=\inf\{d>0:\mathrm{corr}_{q,\beta}(d)>0\}.$ A priori, calculating $d_{\mathrm{rec}}(q,\beta)$ appears to be rather challenging because we seem to have to control the joint distribution of the colors at distance $\ell$ from $y$ . However, according to physics predictions $d_{\mathrm{rec}}(q,\beta)$ is identical to the corresponding threshold on a random tree [43], a conceptually much simpler object. Formally, let $\mathbb{T}(d)$ be the Galton-Watson tree with offspring distribution ${\rm Po}(d)$ . Let $r$ be its root and for an integer $\ell\geq 1$ let $\mathbb{T}^{\ell}(d)$ be the finite tree obtained by deleting all vertices at distance greater than $\ell$ from $r$ . Then

[TABLE]

measures the extent of correlations between the color of the root and the colors at the boundary of the tree. Accordingly, the tree reconstruction threshold is defined as $d_{\mathrm{rec}}^{\star}(q,\beta)=\inf\{d>0:\mathrm{corr}_{q,\beta}^{\star}(d)>0\}.$ Combining Theorem 1.4 with a result of Gerschenfeld and Montanari [34], we obtain the following result.

Corollary 1.5.

For every $q\geq 2$ and $\beta>0$ we have $1\leq d_{\mathrm{rec}}(q,\beta)=d_{\mathrm{rec}}^{\star}(q,\beta)\leq d_{\mathrm{cond}}(q,\beta).$

Previously it was known that $d_{\mathrm{rec}}(q,\beta)=d_{\mathrm{rec}}^{\star}(q,\beta)$ for $q$ exceeding some (large but) undetermined constant $q_{0}$ [55]. This assumption was required because the proof depended on model-specific combinatorial considerations. A merit of the present approach is that we replace such combinatorial arguments by abstract probabilistic ones.

1.4. The stochastic block model

The disassortative stochastic block model, originally introduced by Holland, Laskey, and Leinhardt [38], is an intensely studied statistical inference problem associated with the Potts model [56]. We first choose a random coloring $\boldsymbol{\sigma}^{*}:V_{n}\to\Omega$ of $n$ vertices with $q\geq 2$ colors. Then, setting

[TABLE]

we generate a random graph $\mathbb{G}^{*}$ by connecting any two vertices $v,w$ of the same color $\boldsymbol{\sigma}^{*}(v)=\boldsymbol{\sigma}^{*}(w)$ with probability $d_{\mathrm{in}}/n$ and any two with distinct colors with probability $d_{\mathrm{out}}/n$ independently. Thus, the average degree of $\mathbb{G}^{*}$ converges to $d$ in probability.

Two fundamental statistical problems arise [26]. First, given $q,\beta$ , for what values of $d$ is it possible to recover a non-trivial approximation of $\boldsymbol{\sigma}^{*}$ given just the random graph $\mathbb{G}^{*}$ , i.e., to do better than just a random guess (see [26] for a formal definition)? A second, more modest task is the detection problem, which merely asks whether the random graph $\mathbb{G}^{*}$ chosen from the stochastic block can be told model apart from the natural “null model”, namely the plain Erdős-Rényi random graph $\mathbb{G}$ .

Decelle, Krzakala, Moore and Zdeborová [26] predicted that for $d<d_{\mathrm{cond}}(q,\beta)$ , i.e., below the Potts condensation threshold (1.5), it is information-theoretically impossible to solve either problem. That is, there is no test or algorithm that can infer with probability tending to $1$ as $n\to\infty$ whether its input was created via the stochastic block model or the Erdős-Rényi model, let alone obtain a non-trivial approximation to $\boldsymbol{\sigma}^{*}$ . On the other hand, they predicted that there exist efficient algorithms to solve either problem if $d$ exceeds the Kesten-Stigum bound (1.7). Both of these conjectures were proved in the case $q=2$ by Mossel, Neeman and Sly [58, 59] and Massoulié [45]. After advances by Bordanve, Lelarge and Massoulié [20], the positive algorithmic conjecture was proved in full by Abbe and Sandon [3]. On the negative side, [23, Theorem 1.3] shows that no algorithm can infer a non-trivial approximation to $\boldsymbol{\sigma}^{*}$ if $d<d_{\mathrm{cond}}(q,\beta)$ for any $q\geq 3$ , $\beta>0$ . Additionally, Banks, Moore, Neeman, and Netrapalli [12] employed a second moment argument based on Achlioptas and Naor [8] to determine an explicit range of $d$ where it is impossible to discern whether the graph was created via the stochastic block model or the Erdős-Rényi model. However, there has remained an extensive gap between their explicit bound and the actual condensation threshold.

Our next result closes this gap and thus settles the conjecture from [26]. Recall that the random graph models $\mathbb{G},\mathbb{G}^{*}$ are mutually contiguous for $d>0$ if for any sequence $(\mathcal{A}_{n})_{n}$ of events we have

[TABLE]

If so, then clearly no algorithm (efficient or not) can discern with probability $1-o(1)$ whether a given graph stems from the stochastic block model $\mathbb{G}^{*}$ or the “null model” $\mathbb{G}$ .

Theorem 1.6.

For all $q\geq 3$ , $\beta>0$ , $d<d_{\mathrm{cond}}(q,\beta)$ the random graph models $\mathbb{G}$ and $\mathbb{G}^{*}$ are mutually contiguous.

This result is tight since [23, Theorem 2.6] implies that $\mathbb{G},\mathbb{G}^{*}$ fail to be mutually contiguous for $d>d_{\mathrm{cond}}(q,\beta)$ .

Theorem 1.6 deals with the disassortative version of the block model, which corresponds to the Potts antiferromagnet. There is a contiguity conjecture in [26] for the assortative (viz. ferromagnetic) version as well, and Banks, Moore, Neeman, and Netrapalli [12] obtained upper and lower bounds in that case too, but the techniques of the present work do not apply to ferromagnetic models (see Section 2.4).

2. Main results

Factor graph models have emerged as a unifying framework for a multitude of concrete models arising in physics, combinatorics, and other disciplines [47, 63]. The main results of this paper, which we present in this section, therefore deal with a general class of random factor graph models, subject merely to a few easy-to-check assumptions. In Section 2.1 we define this general notion. Then we state the results for general random factor graph models in Section 2.2. Moreover, in Section 2.3 we indicate how the diluted $k$ -spin model, the Potts antiferromagnet and the stochastic block model fit this framework. Section 2.4 contains a discussion of related work.

2.1. Factor graphs

The following definition encompasses most important examples of spin systems on graphs [47].

Definition 2.1.

Let $\Omega$ be a finite set of spins, let $k\geq 2$ be an integer and let $\Psi$ be a set of functions $\psi:\Omega^{k}\to(0,2)$ that we call weight functions. A $\Psi$ -factor graph $G=(V,F,(\partial a)_{a\in F},(\psi_{a})_{a\in F})$ consists of

•

a finite set $V$ of variable nodes,

•

a finite set $F$ of constraint nodes,

•

an ordered $k$ -tuple $\partial a=(\partial_{1}a,\ldots,\partial_{k}a)\in V^{k}$ for each $a\in F$ ,

•

a family $(\psi_{a})_{a\in F}\in\Psi^{F}$ of weight functions.

The Gibbs distribution of $G$ is the probability distribution on $\Omega^{V}$ defined by $\mu_{G}(\sigma)=\psi_{G}(\sigma)/{Z(G)}$ for $\sigma\in\Omega^{V}$ , where

[TABLE]

A $\Psi$ -factor graph $G$ induces a bipartite graph with vertex sets $V$ and $F$ where $a\in F$ is adjacent to $\partial_{1}a,\ldots,\partial_{k}a$ . We shall therefore use common graph-theoretic terminology and refer to, e.g., the vertices $\partial_{1}a,\ldots,\partial_{k}a$ as the neighbors of $a$ . Furthermore, the length of shortest paths in the bipartite graph induces a metric on the nodes of $G$ .

Diluted mean-field models correspond to random factor graphs. To define them formally, we observe that any weight function $\psi:\Omega^{k}\to(0,2)$ can be viewed as a point in $|\Omega|^{k}$ -dimensional Euclidean space. We thus endow the set of all possible weight functions with the $\sigma$ -algebra induced by the Borel algebra. Further, for a weight function $\psi:\Omega^{k}\to(0,2)$ and a permutation $\theta:\{1,\ldots,k\}\to\{1,\ldots,k\}$ we define $\psi^{\theta}:\Omega^{k}\to(0,2)$ , $(\sigma_{1},\ldots,\sigma_{k})\mapsto\psi(\sigma_{\theta(1)},\ldots,\sigma_{\theta(k)})$ . Throughout the paper we assume that $\Psi$ is a measurable set of weight functions such that for all $\psi\in\Psi$ and all permutations $\theta$ we have $\psi^{\theta}\in\Psi$ . Moreover, we fix a probability distribution $P$ on $\Psi$ . We always denote by $\boldsymbol{\psi}$ an element of $\Psi$ chosen from $P$ , and we set

[TABLE]

Furthermore, we always assume that $P$ is such that the following three inequalities hold:

[TABLE]

The first two inequalities bound the ‘tails’ of $\boldsymbol{\psi}(\tau)$ for $\tau\in\Omega^{k}$ . The third one provides that $\boldsymbol{\psi}$ is non-constant.

With these conventions in mind suppose that $n,m>0$ are integers. Then we define a random $\Psi$ -factor graph $\boldsymbol{G}(n,m,P)$ as follows. The set of variable nodes is $V_{n}=\{x_{1},\ldots,x_{n}\}$ , the set of constraint nodes is $F_{m}=\{a_{1},\ldots,a_{m}\}$ and the neighborhoods $\partial a_{i}\in V_{n}^{k}$ are chosen uniformly and independently for $i=1,\ldots,m$ . Furthermore, the weight functions $\psi_{a_{i}}\in\Psi$ are chosen from the distribution $P$ mutually independently and independently of the neighborhoods $(\partial a_{i})_{i=1,\ldots,m}$ . Where $P$ is apparent we just write $\boldsymbol{G}(n,m)$ rather than $\boldsymbol{G}(n,m,P)$ .

Since we aim to study models on sparse random graphs such as the Potts model on the Erdős-Rényi graph we are concerned with the case that $m=O(n)$ as $n\to\infty$ . To express this elegantly and in order to be able to take the thermodynamic limit $n\to\infty$ easily, we fix a real $d>0$ that does not depend on $n$ , let $\boldsymbol{m}=\boldsymbol{m}_{d}(n)$ have distribution ${\rm Po}(dn/k)$ and write $\boldsymbol{G}=\boldsymbol{G}(n,\boldsymbol{m},P)$ for brevity. Then the expected degree of a variable node is equal to $d$ .

While in $\boldsymbol{G}$ the neighborhoods $\partial a_{i}\in V_{n}^{k}$ are chosen uniformly, in order to accommodate certain applications such as the Potts model on the Erdős-Rényi graph we need to impose two conditions. First, that for any constraint node $a_{i}$ the $k$ neighboring variable nodes $\partial_{1}a_{i},\ldots,\partial_{k}a_{i}$ are distinct. Second, that $\{\partial_{1}a_{i},\ldots,\partial_{k}a_{i}\}\neq\{\partial_{1}a_{j},\ldots,\partial_{k}a_{j}\}$ for all $i\neq j$ . Let us denote the event that these two conditions hold by $\mathfrak{S}$ . Combinatorially $\mathfrak{S}$ is the event that the hypergraph whose vertices are the variable nodes and whose edges are the neighborhoods of the contraint nodes is simple and $k$ -uniform. We are going to state all results both for the unconstraint $\boldsymbol{G}$ and conditional on $\mathfrak{S}$ .

Apart from the condition (2.1), which we assume tacitly, the main results require (some of) the following four assumptions. Crucially, they only refer to the distribution $P$ on the set $\Psi$ of weight functions.

**SYM: **

For all $i\in\{1,\ldots,k\}$ , $\omega\in\Omega$ and $\psi\in\Psi$ we have

[TABLE]

and for every permutation $\theta$ and every measurable $\mathcal{A}\subset\Psi$ we have $P(\mathcal{A})=P(\{\psi^{\theta}:\psi\in\mathcal{A}\})$ .

**BAL: **

The function

[TABLE]

is concave and attains its maximum at the uniform distribution on $\Omega$ .

**MIN: **

Let ${\mathcal{R}}(\Omega)$ be the set of all probability distribution $\rho=(\rho(s,t))_{s,t\in\Omega}$ on $\Omega\times\Omega$ such that $\sum_{s\in\Omega}\rho(s,t)=\sum_{s\in\Omega}\rho(t,s)=q^{-1}$ for all $t\in\Omega$ . The function

[TABLE]

has the uniform distribution on $\Omega\times\Omega$ as its unique global minimizer.

**POS: **

For all $\pi,\pi^{\prime}\in\mathcal{P}_{*}^{2}(\Omega)$ the following is true. With ${\boldsymbol{\rho}}_{1},{\boldsymbol{\rho}}_{2},\ldots$ chosen from $\pi$ , ${\boldsymbol{\rho}}_{1}^{\prime},{\boldsymbol{\rho}}_{2}^{\prime},\ldots$ chosen from $\pi^{\prime}$ and $\boldsymbol{\psi}\in\Psi$ chosen from $P$ , all mutually independent, we have

[TABLE]

Conditions very similar to SYM, BAL and POS appeared in [23] as well. SYM is a symmetry condition.In the language of the cavity method [47], the condition ensures that the unique Belief Propagation fixed point on any acyclic $\Psi$ -factor graph is such that all messages are identical to the uniform distribution on $\Omega$ (but we will not need this fact explicitly).111The condition (2.2) emerged out of a discussion with Guilhem Semerjian. Condition BAL is going to guarantee that for small enough values of $d$ the Gibbs measure $\mu_{\boldsymbol{G}}$ is typically concentrated on “balanced” $\sigma\in\Omega^{V_{n}}$ , i.e., $|\sigma^{-1}(\omega)|\sim n/q$ for all $\omega\in\Omega$ . Further, MIN is a technical condition that we need in order to study the overlap of two independent Gibbs samples. Finally, POS is required so that we can apply certain results from [23]. As we shall see in Section 2.3, the conditions are easily verified in the models from Section 1 and several others.

2.2. Results

We proceed to state the results on the condensation phase transition, the limiting distribution of the free energy, the overlap, the reconstruction and the detection thresholds for random factor graph models.

2.2.1. The condensation phase transition

The following theorem pins down the condensation phase transition in random factor graph models precisely in terms of a stochastic optimization problem that encodes the “1RSB cavity equations with Parisi parameter $1$ ” from the cavity method [47].

Theorem 2.2.

Assume that $P$ satisfies SYM, BAL and POS and let $d>0$ . With $\boldsymbol{\gamma}$ a ${\rm Po}(d)$ -random variable, ${\boldsymbol{\rho}}_{1}^{\pi},{\boldsymbol{\rho}}_{2}^{\pi},\ldots$ chosen from $\pi\in\mathcal{P}_{*}^{2}(\Omega)$ and $\boldsymbol{\psi}_{1},\boldsymbol{\psi}_{2},\ldots\in\Psi$ chosen from $P$ , all mutually independent, let

[TABLE]

Then $1/(k-1)\leq d_{\mathrm{cond}}<\infty$ and

[TABLE]

Theorem 2.2 generalizes [23, Theorem 2.7], which requires that the set $\Psi$ of weight functions be finite.

Admittedly the formula for $d_{\mathrm{cond}}$ provided by Theorem 2.2 is neither very simple nor very explicit, but we are not aware of any reason why it ought to be. Yet there is a natural generalization of the Kesten-Stigum bound for the Potts model from (1.7) that provides an easy-to-compute upper bound on $d_{\mathrm{cond}}$ in terms of the spectrum of a certain linear operator. The operator is constructed as follows. For $\psi\in\Psi$ let $\Phi_{\psi}\in\mathbb{R}^{\Omega\times\Omega}$ be the matrix with entries

[TABLE]

and let $\Xi=\Xi_{P}$ be the linear operator on the $q^{2}$ -dimensional space $\mathbb{R}^{\Omega}\otimes\mathbb{R}^{\Omega}$ defined by

[TABLE]

Further, with $\boldsymbol{1}$ denoting the vector with all entries equal to one, let

[TABLE]

Finally, we introduce

[TABLE]

with the convention that $d_{\mathrm{KS}}=\infty$ if $\max_{x\in{\mathcal{E}}:\|x\|=1}\left\langle{{\Xi x},{x}}\right\rangle=0$ .

Theorem 2.3.

If $P$ satisfies SYM and BAL, then $d_{\mathrm{cond}}\leq d_{\mathrm{KS}}$ .

We shall see in Section 3 that $\Xi$ is related to the “broadcasting matrix” of a suitable Galton-Watson tree, which justifies referring to $d_{\mathrm{KS}}$ as a generalized version of the classical Kesten-Stigum bound from [41]. While the Kesten-Stigum bound is not generally tight, it plays a major conceptual role, as will emerge in due course.

2.2.2. The free energy

Theorem 2.2 easily implies that $n^{-1}\ln Z(\boldsymbol{G})$ converges to $\ln q+\frac{d}{k}\ln\xi$ in probability if $d<d_{\mathrm{cond}}$ . Yet due to the scaling factor of $1/n$ this is but a rough first order approximation. The next theorem, arguably the principal achievement of this paper, yields the exact limiting distribution of the unscaled free energy $\ln Z(\boldsymbol{G})$ in the entire replica symmetric phase. Recalling (2.5), we introduce the $\Omega\times\Omega$ -matrix

[TABLE]

Also recall that $\boldsymbol{m}{\stackrel{{\scriptstyle\mbox{\scriptsize d}}}{{=}}}{\rm Po}(dn/k)$ denotes the number of constraint nodes of $\boldsymbol{G}$ and let $\mathrm{Eig}(\Phi)$ be the spectrum of $\Phi$ .

Theorem 2.4.

Assume that $P$ satisfies SYM, BAL, POS and MIN and that $0<d<d_{\mathrm{cond}}$ . Let $(K_{l})_{l\geq 1}$ be a family of Poisson variables with means $\mathbb{E}[K_{l}]=\frac{1}{2l}(d(k-1))^{l}$ and let $(\boldsymbol{\psi}_{l,i,j})_{l,i,j\geq 1}$ be a sequence of samples from $P$ , all mutually independent. Then the random variable

[TABLE]

satisfies $\mathbb{E}|\mathcal{K}|<\infty$ and

[TABLE]

in distribution. Further, given $\mathfrak{S}$ the random variable on the left hand side of (2.11) converges in distribution to

[TABLE]

which also satisfies $\mathbb{E}|\mathcal{K}^{\prime}|<\infty$ .

Since key parameters of the random factor graph such as the size of the largest connected component of $\boldsymbol{G}$ exhibit fluctuations of order $\sqrt{n}$ even once we condition on $\boldsymbol{m}$ , one might a priori expect that the same is true of the free energy $\ln Z(\boldsymbol{G})$ . However, (2.11) shows that given $\boldsymbol{m}$ the free energy has bounded fluctuations.

2.2.3. The overlap

For $\sigma,\tau\in\Omega^{V_{n}}$ we define the overlap $\rho_{\sigma,\tau}=(\rho_{\sigma,\tau}(\omega,\omega^{\prime}))_{s,t\in\Omega}\in\mathcal{P}(\Omega\times\Omega)$ by letting

[TABLE]

Let $\bar{\rho}$ be the uniform distribution on $\Omega\times\Omega$ . The following theorem confirms one of the core tenets of the cavity method, namely the absence of extensive long-range correlations for $d<d_{\mathrm{cond}}$ . We write $\boldsymbol{\sigma},\boldsymbol{\tau}$ for two independent samples chosen from the Gibbs measure $\mu_{\boldsymbol{G}}$ , $\left\langle{\,\cdot\,}\right\rangle_{\boldsymbol{G}}$ for the expectation with respect to the $\mu_{\boldsymbol{G}}$ and $\mathbb{E}\left[{\,\cdot\,}\right]$ for the expectation with respect to the choice of $\boldsymbol{G}$ .

Theorem 2.5.

If $P$ satisfies SYM, BAL, POS and MIN, then

[TABLE]

If we let $\mu_{\boldsymbol{G},y}(\,\cdot\,)=\left\langle{\boldsymbol{1}\{\boldsymbol{\sigma}(y)=\,\cdot\,\}}\right\rangle_{\boldsymbol{G}}$ be the Gibbs marginal of $y\in V_{n}$ and $\mu_{\boldsymbol{G},y_{1},y_{2}}(\,\cdot\,,\,\cdot\,)=\left\langle{\boldsymbol{1}\{\boldsymbol{\sigma}_{1}(y_{1})=\,\cdot\,,\boldsymbol{\sigma}_{2}(y_{2})=\,\cdot\,\}}\right\rangle_{\boldsymbol{G}}$ the joint distribution of the spins at $y_{1},y_{2}\in V_{n}$ , then Theorem 2.5 implies together with standard arguments that

[TABLE]

In other words, for $d<d_{\mathrm{cond}}$ with probability tending to $1$ as $n\to\infty$ , the spins assigned to two randomly chosen variable nodes $y_{1},y_{2}$ are asymptotically independent.

Conversely, Theorem 2.5 shows that for any $\varepsilon>0$ there exists $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+\varepsilon$ such that

[TABLE]

Hence, if we know that the Gibbs marginals $\mu_{\boldsymbol{G},y}$ are uniform (e.g., due to the symmetry among colors in the Potts model or the inversion symmetry in the $k$ -spin model for even $k$ ), then (2.12) becomes

[TABLE]

Since two randomly chosen variable nodes $y_{1},y_{2}$ of $\boldsymbol{G}$ have distance $\Omega(\ln n)$ with probability $1-o(1)$ , (2.13) states that long range correlations persist for $d$ beyond but arbitrarily close to $d_{\mathrm{cond}}$ .

2.2.4. The teacher-student model

Finally, there is a natural statistical inference version of the random factor graph model, the teacher-student model [67], a generalization of the stochastic block model from Section 1.4. Suppose that $\sigma:V_{n}\to\Omega$ is an assignment of spins to variable nodes. Then we introduce a random factor graph $\boldsymbol{G}^{*}(n,m,P,\sigma)$ with variable nodes $V_{n}$ and constraint nodes $F_{m}$ such that, independently for each $j\in[m]$ , the neighborhood $\partial a_{j}$ and the weight function $\psi_{a_{j}}$ are chosen from the following joint distribution: for any $y_{1},\ldots,y_{k}\in V_{n}$ and for any measurable $\mathcal{A}\subset\Psi$ ,

[TABLE]

Thus, the probability of the outcome $(y_{1},\ldots,y_{k}),\psi_{a_{j}}=\psi$ is the ‘prior’ probability $P(\psi)$ of selecting $\psi$ times the ‘posterior’ weight $\psi(\sigma(y_{1}),\ldots,\sigma(y_{k}))$ .

Further, given $d>0$ consider the following experiment where the initial assignment is chosen randomly as well.

**TCH1: **

an assignment $\boldsymbol{\sigma}^{*}:V_{n}\to\Omega$ , the ground truth, is chosen uniformly at random.

**TCH2: **

independently of $\boldsymbol{\sigma}^{*}$ , draw $\boldsymbol{m}=\boldsymbol{m}_{d}(n)$ from the Poisson distribution with mean $dn/k$ .

**TCH3: **

generate $\boldsymbol{G}^{*}=\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\boldsymbol{\sigma}^{*})$ .

The intuition behind this model is that a “teacher”, in possession of the ground truth $\boldsymbol{\sigma}^{*}$ , finds herself unable to communicate $\boldsymbol{\sigma}^{*}$ to a student directly. Instead the teacher utilizes $\boldsymbol{\sigma}^{*}$ to set up a random factor graph $\boldsymbol{G}^{*}$ that the student gets to observe. Given $\boldsymbol{G}^{*}$ the student aims to recover $\boldsymbol{\sigma}^{*}$ as best as possible. As in the case of the stochastic block model, two natural questions arise: given $\boldsymbol{G}^{*}$ , is it information-theoretically possible to accomplish a better approximation to $\boldsymbol{\sigma}^{*}$ than a mere independent random guess? More modestly, there is the detection problem: given a factor graph $G$ is it possible to discern with probability $1-o(1)$ as $n\to\infty$ whether $G$ was chosen from the model $\boldsymbol{G}^{*}$ or from the “null model” $\boldsymbol{G}$ ? As the imprint that the ground truth imbues on $\boldsymbol{G}^{*}$ increases with $d$ , we should expect the existence of a threshold from where either problem turns solvable. Regarding the detection problem, we recall that the random graph models $\boldsymbol{G},\boldsymbol{G}^{*}$ are mutually contiguous if for any sequence $(\mathcal{A}_{n})_{n}$ of events we have $\lim_{n\to\infty}\mathbb{P}\left[{\boldsymbol{G}\in\mathcal{A}_{n}}\right]=0$ iff $\lim_{n\to\infty}\mathbb{P}\left[{\boldsymbol{G}^{*}\in\mathcal{A}_{n}}\right]=0$ . The following theorem establishes a generalization of the conjectures put forward in [26] for the stochastic block model to the case of random factor graph models.

Theorem 2.6.

If $P$ satisfies SYM, BAL, POS and MIN, then $\boldsymbol{G},\boldsymbol{G}^{*}$ are mutually contiguous for all $d<d_{\mathrm{cond}}$ , while $\boldsymbol{G},\boldsymbol{G}^{*}$ fail to be mutually contiguous for $d>d_{\mathrm{cond}}$ . The same holds given $\boldsymbol{G},\boldsymbol{G}^{*}\in\mathfrak{S}$ .

Previously it was known that for $d<d_{\mathrm{cond}}$ it is impossible to recover an assignment that has a strictly greater overlap with $\boldsymbol{\sigma}^{*}$ [23, Theorem 2.6]. Theorem 2.6 shows that, in fact, $d_{\mathrm{cond}}$ marks the threshold for the feasibility of the humble detection problem.

While Theorem 2.6 is bad news from a statistical inference point of view, the upshot is that throughout the replica symmetric phase typical properties of Gibbs samples of $\boldsymbol{G}$ can be investigated accurately by way of the teacher-student model $(\boldsymbol{G}^{*},\boldsymbol{\sigma}^{*})$ , a technique known as “quiet planting” [4, 42]. This idea has been used critically in rigorous work on specific examples of random factor graph models, e.g., [54]. Formally, quiet planting applies if the factor graph/assignment pair $(\boldsymbol{G}^{*},\boldsymbol{\sigma}^{*})$ comprising the ground truth $\boldsymbol{\sigma}^{*}$ and the outcome $\boldsymbol{G}^{*}$ of TCH1–TCH3 and the pair $(\boldsymbol{G},\boldsymbol{\sigma})$ consisting of the random factor graph $\boldsymbol{G}$ and a Gibbs sample $\boldsymbol{\sigma}$ of $\boldsymbol{G}$ are mutually contiguous. Previously this was known to be true for a few specific models (e.g., [16, 22]), albeit not generally in the entire replica symmetric phase. The following corollary to Theorem 2.6 shows that “quiet planting” is a universal phenomenon.

Corollary 2.7.

Assume that $P$ satisfies SYM, BAL, POS and MIN. For all $d<d_{\mathrm{cond}}$ the pairs $(\boldsymbol{G},\boldsymbol{\sigma})$ and $(\boldsymbol{G}^{*},\boldsymbol{\sigma}^{*})$ are mutually contiguous. The same is true given $\boldsymbol{G},\boldsymbol{G}^{*}\in\mathfrak{S}$ .

2.2.5. Reconstruction

According to the physics deliberations the condensation phase transition is generally preceded by another threshold where certain point-to-set correlations emerge, the reconstruction threshold [43]. Reconstruction plays a major role in the cavity formalism because it provides the conceptual underpinning for the notion that the Gibbs measure decomposes into a multitude of “clusters” [47, 51]. Formally, suppose that $G$ is a factor graph with variable nodes $V$ , $y\in V$ and that $\ell\geq 0$ . Let $\nabla_{\ell}(G,y)$ be the $\sigma$ -algebra on $\Omega^{V}$ generated by the random variables $\boldsymbol{\sigma}(z)$ such that $z$ is a variable node whose distance from $y$ in $G$ is at least $2\ell$ . Further, define

[TABLE]

Of course, the expectation $\mathbb{E}\left[{\,\cdot\,}\right]$ refers to the choice of $\boldsymbol{G}$ , the outer expectation $\left\langle{\,\cdot\,}\right\rangle_{\boldsymbol{G}}$ averages over the “boundary condition”, i.e., the spins of the variable nodes at distance at least $2\ell$ from $y$ , and the inner $\langle\,\cdot\,|\nabla_{\ell}(\boldsymbol{G},y)\rangle_{\boldsymbol{G}}$ is the conditional expectation given the boundary condition. If $\mathrm{corr}(d)=0$ , then the influence of a “typical" boundary condition on the spin of $y$ decays with the radius $\ell$ . Thus, the reconstruction threshold $d_{\mathrm{rec}}=\inf\{d>0:\mathrm{corr}(d)>0\}$ is the smallest degree where the influence of the boundary persists.

A priori determining $d_{\mathrm{rec}}$ appears to be challenging because the joint distribution of the spins at distance $2\ell$ from $y$ is determined not merely by the “local” effects within the radius- $2\ell$ neighborhood of $y$ but also by the graph beyond. But according to physics predictions (e.g., [43]), actually $d_{\mathrm{rec}}$ is equal to the corresponding threshold on a suitable Galton-Watson tree. Conceptually this amounts to an enormous simplification because the branches of the tree are mutually dependent only through their being connected to the root, a situation amenable to precise treatment via the Belief Propagation message passing scheme [47].

Formally, we introduce a multi-type Galton-Watson tree $\boldsymbol{T}(d,P)$ that mimics the local geometry of $\boldsymbol{G}$ . The types are either variable nodes or constraint nodes, each of the latter endowed with a weight function $\psi\in\Psi$ . The root of the Galton-Watson tree is a variable node $r$ . The offspring of a variable node is a ${\rm Po}(d)$ number of constraint nodes whose weight functions are chosen from $P$ independently. Moreover, the offspring of a constraint node is $k-1$ variable nodes. For an integer $\ell\geq 0$ we let $\boldsymbol{T}^{\ell}(d,P)$ denote the (finite) tree obtained from $\boldsymbol{T}(d,P)$ by deleting all variable or constraint nodes at distance greater than $2\ell$ from $r$ . In analogy to (2.15) we set

[TABLE]

The tree reconstruction threshold is defined as $d_{\mathrm{rec}}^{\star}=\inf\{d>0:\mathrm{corr}^{\star}(d)>0\}$ .

Theorem 2.8.

Suppose that $P$ satisfies SYM, BAL, POS and MIN. Then $0<d_{\mathrm{rec}}=d_{\mathrm{rec}}^{\star}\leq d_{\mathrm{cond}}$ and $\mathrm{corr}(d)>0$ for all $d\in(d_{\mathrm{rec}},d_{\mathrm{cond}})$ . Moreover,

[TABLE]

We prove Theorem 2.8 by way of the teacher-student model and the “quiet planting” result Corollary 2.7. This argument provides a perspective on the reconstruction problem that has an impact on the statistical inference questions as well. Specifically, we observe that the reconstruction problem on the random tree $\boldsymbol{T}(d,P)$ is equivalent to a natural “Bayesian” reconstruction problem in the teacher-student model. Formally, let $\nabla^{*}_{\ell}(\boldsymbol{G}^{*},\boldsymbol{\sigma}^{*},y)$ be the $\sigma$ -algebra generated by the graph $\boldsymbol{G}^{*}$ and the random variables $\boldsymbol{\sigma}^{*}(z)$ with $z$ at distance at least $2\ell$ from $y$ . Then

[TABLE]

measures the correlation between $\boldsymbol{\sigma}^{*}(y)$ , the spin at $y$ under the ground truth, and the spins that $\boldsymbol{\sigma}^{*}$ assigns to the variables at distance at least $2\ell$ . The proof of Theorem 2.8 is based on showing that $\mathrm{corr}^{*}(d)=\mathrm{corr}^{\star}(d)$ for all $d$ .

Theorem 2.9.

If $P$ satisfies SYM, BAL, POS and MIN, then for all $d>0$ we have

[TABLE]

Finally, we highlight an immediate but interesting consequence of Theorems 2.3 and 2.8 that generalizes the classical Kesten-Stigum upper bound for reconstruction on trees [41].

Corollary 2.10.

If $P$ satisfies SYM, BAL, POS and MIN, then $\textrm{corr}^{\star}(d)>0$ for all $d>d_{\mathrm{KS}}$ .

The reconstruction problem on a certain class of random factor graph models (that includes, e.g., the Potts antiferromagnet) was previously studied by Gerschenfeld and Montanari [34]. They observed that overlap concentration about $\bar{\rho}$ as provided by Theorem 2.5 for $d<d_{\mathrm{cond}}$ guarantees that the reconstruction thresholds $d_{\mathrm{rec}}$ and $d_{\mathrm{rec}}^{\star}$ coincide. Subsequently, with the condensation threshold well out of reach at the time, Montanari, Restrepo and Tetali [55] attempted to verify the required overlap concentration at least for all $d$ up to the tree reconstruction threshold. However, their combinatorial (essentially second moment) argument did not cover the entire range of parameters, e.g., all $q$ and/or all $\beta$ in the Potts model. By comparison to [34, 55], Theorem 2.9 provides a different, perhaps more conceptual angle: tree reconstruction is equivalent to reconstruction in the teacher-student model for all $d$ , and up to $d_{\mathrm{cond}}$ the equivalence extends to the random factor graph model $\boldsymbol{G}$ thanks to contiguity.

2.3. Examples

Here we show how the models from Section 1 can be cast as random factor graph models that satisfy the assumptions SYM, BAL, POS and MIN.

2.3.1. The Potts antiferromagnet

For an integer $q\geq 2$ and a real $\beta>0$ we let $\Omega=\{1,\ldots,q\}$ and

[TABLE]

Let $\Psi$ be the singleton $\{\psi_{q,\beta}\}$ . Then the Potts model on a given graph $G=(V,E)$ can be cast as a $\Psi$ -factor graph: we just set up the factor graph $G^{\prime}=(V,E,(\partial e)_{e\in E},(\psi_{e})_{e\in E})$ whose variable nodes are the vertices of the original graph $G$ and whose constraint nodes are the edges of $G$ . For an edge $e=\{x,y\}\in E$ we let $\partial e=(x,y)$ , where, say, the order of the neighbors is chosen randomly, and $\psi_{e}=\psi_{q,\beta}$ , of course. Then $\mu_{G^{\prime}}$ coincides with $\mu_{G,q,\beta}$ from (1.3).

To mimic the Potts model on the Erdős-Rényi graph $\mathbb{G}=\mathbb{G}(n,d/n)$ we let $P_{\mathrm{Potts}}=\delta_{\psi_{q,\beta}}$ be the atom on $\psi_{q,\beta}$ . Then the sole difference between the factor graph representation $\mathbb{G}^{\prime}$ of the Erdős-Rényi graph $\mathbb{G}$ and $\boldsymbol{G}=\boldsymbol{G}(n,\boldsymbol{m},P)$ is that the latter may have factor nodes $a$ such that $\partial_{1}a=\partial_{2}a$ (“self-loops”) or pairs of distinct factor nodes $a,b$ such that $\{\partial_{1}a,\partial_{2}a\}=\{\partial_{1}b,\partial_{2}b\}$ (“double-edges”). However, conditioning on the event $\mathfrak{S}$ rules out self-loops and double-edges. Indeed, we have the following.

Fact 2.11 ([23, Lemma 4.1]).

The random factor graph $\mathbb{G}^{\prime}$ and $\boldsymbol{G}$ given $\mathfrak{S}$ are mutually contiguous.

Lemma 2.12.

The assumptions SYM, BAL, POS and MIN hold for $P_{\mathrm{Potts}}$ for all $q\geq 2$ and all $\beta>0$ .

Proof.

That SYM, BAL and POS hold is known already [23, Lemma 4.3]. With respect to MIN, we observe that for any distribution $\rho$ on $\Omega\times\Omega$ with uniform marginals,

[TABLE]

The last expression is strictly convex as a function of $\rho$ with the minimum attained at the uniform distribution. ∎

Thus the results stated in Section 1.3 follow from the results for general random factor graph models. Indeed, to obtain Theorem 1.3 we observe that the matrices from (2.5), (2.6) and (2.9) satisfy

[TABLE]

where $\boldsymbol{1}$ is the all-ones matrix and $\mathrm{id}$ is the identity matrix. Clearly, the eigenvalues of $\Phi$ are $1$ and $(\mathrm{e}^{-\beta}-1)/(q-1+\mathrm{e}^{-\beta})$ , the latter with multiplicity $q-1$ . Hence,

[TABLE]

Thus, Theorem 1.3 follows from Theorem 2.4 and Theorem 1.4 from Theorem 2.5. Finally, (2.19) shows that $\max_{x\in{\mathcal{E}}:\|x\|=1}\left\langle{{\Xi x},{x}}\right\rangle=(1-\mathrm{e}^{-\beta})^{2}/(q-1+\mathrm{e}^{-\beta})^{2}$ and thus (2.8) matches the “classical” Kesten-Stigum bound (1.7).

2.3.2. The stochastic block model

The teacher-student model $\boldsymbol{G}^{*}$ corresponding to $P_{\mathrm{Potts}}$ is very similar to the stochastic block model. As in the case of the Potts model on the Erdős-Rényi graph, the only discrepancy is due to the possible occurrence of self-loops and double-edges.

Lemma 2.13 ([23, Lemma 4.4]).

For any $q\geq 2$ , $\beta>0$ , $d>0$ the stochastic block model $\mathbb{G}^{*}$ and the teacher-student model $\boldsymbol{G}^{*}$ given $\mathfrak{S}$ are mutually contiguous.

Theorem 1.6 follows from Theorem 2.6 and Lemma 2.13.

2.3.3. The $k$ -spin model

Let $\Omega=\{\pm 1\}$ . For $J\in\mathbb{R},\beta>0$ we could define the weight function $\tilde{\psi}_{J,\beta}(\sigma_{1},\ldots,\sigma_{k})=\exp(\beta J\sigma_{1}\cdots\sigma_{k})$ to match the definition (1.1) of the $k$ -spin model. However, these functions do not necessarily take values in $(0,2)$ . To remedy this problem we introduce $\psi_{J,\beta}(\sigma_{1},\ldots,\sigma_{k})=1+\tanh(J\beta)\sigma_{1}\cdots\sigma_{k}$ . Then (cf. [60])

[TABLE]

Thus, let $\Psi=\{\psi_{J,\beta}:J\in\mathbb{R}\}$ , let $\boldsymbol{\psi}=\psi_{\boldsymbol{J},\beta}$ , where $\boldsymbol{J}$ is a standard Gaussian and let $P_{\boldsymbol{J},\beta}$ be the law of $\boldsymbol{\psi}$ . Similarly as in the case of the Potts model we have the following.

Fact 2.14.

For all $k\geq 2,d>0,\beta>0$ the random measure $\mu_{\mathbb{H},\boldsymbol{J},\beta}$ from (1.1) and the Gibbs measure $\mu_{\boldsymbol{G}(n,\boldsymbol{m},P_{\boldsymbol{J},\beta})}$ of the random factor graph given $\mathfrak{S}$ are mutually contiguous. Furthermore,

[TABLE]

Instead of just verifying the conditions SYM, BAL, POS and MIN for the $k$ -spin model with standard Gaussian couplings $\boldsymbol{J}$ , we will establish the following more general statement. Recall that a random variable $\boldsymbol{J}$ is symmetric if $\boldsymbol{J}$ and $-\boldsymbol{J}$ have the same distribution.

Lemma 2.15.

For any $k\geq 2$ , $\beta>0$ and for any symmetric random variable $\boldsymbol{J}$ such that $P_{\boldsymbol{J},\beta}$ satisfies (2.1) the three conditions SYM, BAL and POS hold. If $k$ is even, then MIN holds as well .

Proof.

It is immediate that $\xi=1$ and that $P_{\boldsymbol{J},\beta}$ satisfies SYM. For BAL observe that $\mu\mapsto\sum_{\tau\in\Omega^{k}}\mathbb{E}[\boldsymbol{\psi}(\tau)]\prod_{i=1}^{k}\mu(\tau_{i})$ is constant because $\boldsymbol{J}$ is symmetric. To verify POS we generalize the argument from [23, Section 4.4] by observing that for any integer $l\geq 1$ , with the notation from POS,

[TABLE]

Hence, expanding $\Lambda(\,\cdot\,)$ and using (2.1) and Fubini’s theorem to swap the sum and the expectation, we find

[TABLE]

Applying similarly manipulations to the other two terms from POS and introducing $X_{l}=\mathbb{E}[({\boldsymbol{\rho}}_{1}(1)-{\boldsymbol{\rho}}_{1}(-1))^{l}]$ , $Y_{l}=\mathbb{E}[({\boldsymbol{\rho}}_{1}^{\prime}(1)-{\boldsymbol{\rho}}_{1}^{\prime}(-1))^{l}]$ , we see that POS comes down to showing that

[TABLE]

Since $\boldsymbol{J}$ is symmetric we get $\mathbb{E}[\tanh(\boldsymbol{J}\beta)^{l}]=0$ for odd $l$ , while $\mathbb{E}[\tanh(\boldsymbol{J}\beta)^{l}]\geq 0$ and $X_{l},Y_{l}\geq 0$ for even $l$ . Hence, (2.21) follows from the elementary fact that $x^{k}-kxy^{k-1}+(k-1)y^{k}\geq 0$ for all $x,y\geq 0$ .

Moving on to MIN, we assume that $k$ is even. Suppose that $\rho\in{\mathcal{R}}(\Omega)$ is a distribution on $\Omega\times\Omega$ with uniform marginals and let $\alpha=\rho(1,1)+\rho(-1,-1)$ . Then $\rho(1,1)=\rho(-1,-1)=\alpha/2$ , $\rho(1,-1)=\rho(-1,1)=(1-\alpha)/2$ and because $\boldsymbol{J}$ is symmetric,

[TABLE]

Because $k$ is even, the last expression is convex with the minimum attained at $\alpha=1/2$ , viz. $\rho=\bar{\rho}$ . ∎

Lemma 2.15 shows not only that the $k$ -spin model from Section 1.2 with a standard Gaussian $\boldsymbol{J}$ satisfies SYM, BAL,POS and MIN, but that the same is true if $\boldsymbol{J}$ is the uniform distribution on $\{\pm 1\}$ . This model is known as the $k$ -XORSAT model in computer science. It is intimately related to low-density generator matrix codes [2].

Proof of Theorem 1.1.

Comparing (1.1) and (2.20), we see that

[TABLE]

Therefore, Theorem 1.1 follows from Theorem 2.2 and Lemma 2.15. ∎

Proof of Theorem 1.2.

Equations (1.1) and (2.20) ensure that the Gibbs measures $\mu_{\mathbb{H},\boldsymbol{J},\beta}$ and $\mu_{\boldsymbol{G}}$ given $\mathfrak{S}$ are identically distributed. Hence, Theorem 1.2 follows from Theorem 2.5 and Lemma 2.15. ∎

2.4. Discussion and related work

The results in this section provide a map of the replica symmetric phase, its boundary and the evolution of the Gibbs measure within it, thereby vindicating for a universal class of models the predictions of the cavity method [43]. The results extend, complement or generalize prior work on the condensation phase transition from [23], which only dealt with the case that the support $\Psi$ of $P$ is finite, and on the reconstruction problem [34, 55]. Additionally, in the example of the Potts antiferromagnet and the stochastic block model prior work based on combinatorial methods only gave approximate results [12, 22], whereas the present results are tight for all values of $q,\beta$ . Indeed, a merit of the present approach is that we perform fairly abstract arguments that do not require model-specific deliberations.

Beyond the examples treated explicitly in Section 2.3 there are several other important and well-studied models that also satisfy the assumptions of our main results. For instance, Bapst, Coja-Oghlan and Raßmann [16] obtained approximate results on the replica symmetry breaking phase transition in the random hypergraph $2$ -coloring problem. This model is easily seen to satisfy SYM, BAL, POS and MIN and thus the main results of the present paper clarify the structure of the entire replica symmetric phase. More generally, the hypergraph version of the Potts model satisfies our assumptions as well. So does the random $k$ -NAESAT model, a variant of Boolean satisfiability that resembles the hypergraph $2$ -coloring model.

Apart from proving an upper bound on the condensation threshold, the Kesten-Stigum bound plays an important role with respect to statistical inference aspects of random factor graph models. Specifically, by extension of the predictions from [26] for the stochastic block model, it seems natural to expect that there should be efficient algorithms for both the detection problem and for recovering a non-trivial approximation to the ground truth in the teacher-student model for $d>d_{\mathrm{KS}}$ . On the other hand, an intriguing question is whether for $d_{\mathrm{cond}}<d<d_{\mathrm{KS}}$ these two problems may be soluble in exponential time but not efficiently, i.e., in polynomial time [12, 26]. Indeed, while Theorem 2.2 shows that $d_{\mathrm{cond}}$ is always finite, there are models where $d_{\mathrm{KS}}=\infty$ , e.g., the $k$ -XORSAT model. Thus, for such models there might be an enormous computational gap. This question is intimately related to the $k$ -SAT refutation problem, an important question in computer science [30, 31].

There are a few models that fail to satisfy our assumptions. For instance, in the random $k$ -SAT model [9] and the hardcore model on the Erdős-Rényi random graph [11] condition SYM is violated. Indeed, in these two cases the Gibbs marginals are non-uniform in the replica symmetric phase. In effect, we do not expect that the free energy is as tightly concentrated as Theorem 2.4 shows it is in the case of “symmetric” models. Thus, it is not just that the present proof methods do not apply, but “asymmetric” models appear to be materially different. Moreover, ferromagnetic models generally violate SYM, BAL and POS.

A further class of models that we do not treat in this paper is models where the weight functions $\psi$ take values in $\{0,1\}$ , thus imposing hard constraints. An example of this is the “zero-temperature” version of the Potts antiferromagnet, better known as the random graph coloring problem [9]. Certain specific models with hard constraints have received considerable attention in combinatorics. For example, [17, 15, 62] established the precise condensation threshold, a contiguity result and the exact limiting distribution of the number of $q$ -colorings of the Erdős-Rényi random graph via combinatorial methods under the assumption that $q$ exceeds a large enough constant. (Subsequently the condensation threshold in the random graph coloring problem was determined for all $q\geq 3$ [23].) Similar results, albeit not quite up to the precise condensation threshold, are know for the hypergraph $2$ -coloring and the $k$ -NAESAT problems [6, 7, 61], a version of the random $k$ -SAT problem with regular literal degrees [24] and the independent set problem in random regular graphs [18]. Additionally, in zero temperature models the ‘satisfiability threshold’ from where $Z(\boldsymbol{G})$ is typically equal to [math] plays a major role [5, 10, 27, 28, 36, 57].

3. Proof strategy

Throughout this section we keep the notation from Section 2.

The apex of the present work is Theorem 2.4 about the limiting distribution of the free energy; all the other results either lead up to it or derive from it relatively easily. The classical approach to proving such a result would be the second moment method, pioneered in this context by Achlioptas and Moore [6], in combination with the small subgraph conditioning technique of Robinson and Wormald [39, 64]. This strategy was applied to, e.g., the stochastic block model [12] and the $k$ -spin model [37]. But only in the stochastic block model with two colors and the diluted $2$ -spin model was it possible to obtain complete results [37, 58]. Indeed, as noticed by Guerra and Toninelli [37], a combinatorial second moment computation generally appears to be too crude a device to cover the entire replica symmetric phase.

Therefore, here we pursue a different strategy. We craft a proof around the teacher-student model $\boldsymbol{G}^{*}$ . More specifically, the main achievement of the recent paper [23] was to verify the cavity formula for the leading order $\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\ln Z(\boldsymbol{G}^{*})]$ of the free energy in the teacher-student model (in the case that the set $\Psi$ is finite). We will replace the second moment calculation by that free energy formula, generalized to infinite $\Psi$ , and combine it with a suitably generalized small subgraph conditioning technique. The challenge is to integrate these two components seamlessly. We accomplish this by realizing that, remarkably, both arguments are inherently and rather elegantly tied together via the spectrum of the linear operator $\Xi$ from (2.6). But to develop this novel approach we first need to recall the classical second moment argument and understand why it founders.

3.1. Two moments do not suffice

For any second moment calculation it is crucial to fix the number of constraint nodes because its fluctuations would otherwise boost the variance. Hence, we will work with a deterministic integer sequence $m=m(n)\geq 0$ . More precisely, we will fix $d>0$ and consider specific integer sequences $m=m(n)\geq 0$ is such that $|m(n)-dn/k|\leq n^{3/5}$ for all $n$ . Let $\mathcal{M}(d)$ be the set of all such sequences.

The second moment method rests on showing that $\mathbb{E}[Z(\boldsymbol{G}(n,m))^{2}]$ is of the same order of magnitude as the square $\mathbb{E}[Z(\boldsymbol{G}(n,m))]^{2}$ of the first moment. If so, then standard concentration results can be used to show that $\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\ln Z(\boldsymbol{G}(n,m))]=\lim_{n\to\infty}\frac{1}{n}\ln\mathbb{E}[Z(\boldsymbol{G}(n,m))]$ . The second limit is easy to compute because the expectation sits inside the logarithm, and thus we obtain the leading order of the free energy.

In fact, if we can calculate the second moment $\mathbb{E}[Z(\boldsymbol{G}(n,m))^{2}]$ sufficiently accurately, then it may be possible to determine the limiting distribution of $\ln Z(\boldsymbol{G}(n,m))$ precisely. For suppose that there is a “simple” random variable $Q(\boldsymbol{G}(n,m))$ such that

[TABLE]

Then the basic formula $\mathrm{Var}[Z(\boldsymbol{G}(n,m))]=\mathrm{Var}[\mathbb{E}[Z(\boldsymbol{G}(n,m))|Q(\boldsymbol{G}(n,m))]]+\mathbb{E}[\mathrm{Var}[Z(\boldsymbol{G}(n,m))|Q(\boldsymbol{G}(n,m))]]$ implies

[TABLE]

and typically it is not difficult to deduce from (3.2) that $\ln Z(\boldsymbol{G}(n,m))-\ln\mathbb{E}[Z(\boldsymbol{G}(n,m))|Q(\boldsymbol{G}(n,m))]$ converges to [math] in probability. Hence, if $Q(\boldsymbol{G}(n,m))$ is “reasonable enough” so that the law of $\ln\mathbb{E}[Z(\boldsymbol{G}(n,m))|Q(\boldsymbol{G}(n,m))]$ is easy to express, then we have got the limiting distribution of $\ln Z(\boldsymbol{G}(n,m))$ . The basic insight behind the small subgraph conditioning technique is that (3.1) sometimes holds with a variable $Q$ that is determined by the statistics of bounded-length cycles in $\boldsymbol{G}(n,m)$ [39, 64].

Anyhow, the crux of the entire argument is to calculate $\mathbb{E}[Z(\boldsymbol{G}(n,m))^{2}]$ . Of course, by the linearity of expectation and the independence of the constraint nodes, the second moment can be written in terms of the overlap $\rho_{\sigma,\tau}$ as

[TABLE]

Given a probability distribution $\rho=(\rho(s,t))_{s,t\in\Omega}$ on $\Omega^{2}$ such that $n\rho(s,t)$ is integral for all $s,t\in\Omega$ , the number of assignments $\sigma,\tau\in\Omega^{V_{n}}$ with $\rho_{\sigma,\tau}=\rho$ equals ${\binom{n}{\rho n}}$ . Therefore, Stirling’s formula yields the approximation

[TABLE]

where $\mathcal{H}(\rho)$ denotes the entropy of $\rho$ . In other words, computing the second moment comes down to identifying the overlap $\rho$ that renders the dominant contribution to (3.3). By comparison, under assumptions SYM and BAL it is not difficult to see (cf. Lemma 4.6 below) that the first moment satisfies

[TABLE]

But there are two major issues with the second moment argument. First, actually solving the innocent-looking optimization problem (3.4) turns out to be daunting even in special cases. For example, in the Potts antiferromagnet the task remains wide open, despite very serious attempts [8, 22]. The source of the trouble is that the entropy is concave while the second summand in (3.4) is convex (cf. MIN), causing a proliferation of local maxima. Second, and even worse, comparing (3.4) and (3.5) we can verify easily that the desired second moment bound $\mathbb{E}[Z(\boldsymbol{G}(n,m,P)^{2}]=O(\mathbb{E}[Z(\boldsymbol{G}(n,m,P)]^{2})$ can hold only if the maximizer $\rho_{\star}$ of (3.4) satisfies $\left\|{\rho_{\star}-\bar{\rho}}\right\|_{\mathrm{TV}}=o(1)$ . However, this is not generally true for average degrees $d$ below but near the condensation threshold. For instance, in the Potts antiferromagnet the second moment exceeds the square of the first moment by an exponential factor $\exp(\Omega(n))$ for $d$ below the condensation threshold [22].

The problem was noticed and partly remedied in prior work by applying the second moment method to a suitably truncated random variable (e.g. [17, 22]). This method revealed, e.g., the condensation threshold in a few special cases such as the random graph $q$ -coloring problem [17], albeit only for $q$ exceeding some (astronomical) constant $q_{0}$ , and in the random regular $k$ -SAT model for large $k$ [14]. Yet apart from introducing such extraneous conditions, ad-hoc arguments of this kind tend to require a meticulous combinatorial study of the specific model.

3.2. The condensation phase transition and the overlap

The merit of the present approach is that we avoid combinatorial deliberations altogether. Rather than bothering with the second moment bound (3.4) we will employ an asymptotic formula for the free energy of the teacher-student model $\boldsymbol{G}^{*}$ . To be precise, it will be convenient to work with a slightly tweaked version $\hat{\boldsymbol{G}}$ of this model: following [23, Section 3], we let $\hat{\boldsymbol{G}}(n,m,P)$ be the random factor graph chosen from the distribution

[TABLE]

Recalling that $\boldsymbol{m}=\boldsymbol{m}_{d}(n)$ is a random variable with distribution ${\rm Po}(dn/k)$ , we also introduce $\hat{\boldsymbol{G}}=\hat{\boldsymbol{G}}(n,\boldsymbol{m},P)$ . As before we ease the notation by dropping $P$ where possible.

Loosely speaking $\hat{\boldsymbol{G}}(n,m)$ is a reweighted version of $\boldsymbol{G}(n,m)$ where the probability that $G$ comes up is proportional to $Z(G)$ . Intuitively, the construction of the teacher-student model $\boldsymbol{G}^{*}$ induces a similar reweighing as the probability that $\boldsymbol{G}^{*}=G$ depends on the number of assignments $\boldsymbol{\sigma}^{*}$ that could plausibly be used to generate $G$ via (2.14). In fact, as we shall see in Section 4 it is not difficult to verify the following.

Lemma 3.1.

If $P$ satisfies conditions SYM and BAL, then $\boldsymbol{G}^{*}(n,m,\boldsymbol{\sigma}^{*})$ and $\hat{\boldsymbol{G}}(n,m)$ are mutually contiguous for all $d>0$ , $m\in\mathcal{M}(d)$ .

The following theorem verifies the cavity formula for the free energy of $\hat{\boldsymbol{G}}$ and $\boldsymbol{G}^{*}$ .

Theorem 3.2.

Assume that $P$ satisfies SYM, BAL and POS and let $d>0$ . Then with $\mathcal{B}(d,P,\pi)$ from (2.3) we have

[TABLE]

Theorem 3.2 was established in [23] for the case that the set $\Psi$ of weight functions is finite. In Section 10 we extend that results via a limiting argument to prove Theorem 3.2 for infinite $\Psi$ . Furthermore, in Section 6 we deduce the following result from Theorem 3.2.

Proposition 3.3.

Assume that BAL, SYM, POS and MIN hold and that $d<d_{\mathrm{cond}}$ . There exists a sequence $\zeta=\zeta(n)$ , $\zeta(n)=o(1)$ but $n^{1/6}\zeta(n)\to\infty$ as $n\to\infty$ , such that for all $m\in\mathcal{M}(d)$ we have

[TABLE]

Proposition 3.3 resolves our second moment troubles. Indeed, the proposition enables a completely generic way of setting up a truncated second moment argument: with $\zeta$ from Proposition 3.3 we define

[TABLE]

Hence, $\mathcal{Z}(G)=Z(G)$ if “most” pairs $\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}$ drawn from $\mu_{G}$ have overlap close to $\bar{\rho}$ , and $\mathcal{Z}(G)=0$ otherwise. Proposition 3.3 shows immediately that the truncation does not diminish the first moment.

Corollary 3.4.

If BAL, SYM, POS and MIN hold and $d<d_{\mathrm{cond}}$ , then $\mathbb{E}[\mathcal{Z}(\boldsymbol{G}(n,m))]\sim\mathbb{E}[Z(\boldsymbol{G}(n,m))]$ uniformly for all $m\in\mathcal{M}(d)$ .

Proof.

Equation (3.6) and Proposition 3.3 yield

[TABLE]

as claimed. ∎

The second moment calculation for $\mathcal{Z}$ is easy, too. Indeed, the very construction (3.8) of $\mathcal{Z}$ guarantees that the dominant contribution to the second moment of $\mathcal{Z}$ comes from pairs with an overlap close to $\bar{\rho}$ . Hence, computing the second moment comes down to expanding the right hand side of (3.4) around $\bar{\rho}$ via the Laplace method. Yet in order to apply the Laplace method we need to verify that $\bar{\rho}$ is a local maximum of the function

[TABLE]

from (3.4). For the special case of the Potts antiferromagnet the overlap concentration (3.7) was established and the second moment argument for $\mathcal{Z}$ was carried out in [23, Section 4.3]. While the generalization to random factor graph models is anything but straightforward, an even more important difference lies in the application of the Laplace method. More specifically, in the case of the Potts antiferromagnet the fact that $\bar{\rho}$ is a local maximum of (3.9) for all $d<d_{\mathrm{cond}}$ was derived extremely indirectly by resorting to the statistical inference algorithm of Abbe and Sandon for the stochastic block model [3]. But of course there ought to be a general, conceptual explanation. As we shall see momentarily, there is one indeed, namely the generalized Kesten-Stigum bound.

3.3. The Kesten-Stigum bound

To see the connection, we observe that the Hessian of (3.9) at the point $\bar{\rho}$ is equal to $q(\mathrm{id}-d(k-1)\Xi)$ (with $\Xi$ the matrix from (2.6)). Hence, taking into account that the argument $\rho$ is a probability distribution on $\Omega\times\Omega$ , we find that $\bar{\rho}$ is a local maximum of (3.9) if and only if

[TABLE]

In order to get a handle on the spectrum of the operator $\Xi$ from (2.6) we begin with the following observation about the matrices $\Phi_{\psi}$ and $\Phi$ from (2.5) and (2.9).

Lemma 3.5.

Assume that $P$ satisfies SYM. Then the matrix $\Phi_{\psi}$ is stochastic and thus $\Phi_{\psi}\boldsymbol{1}=\boldsymbol{1}$ for every $\psi\in\Psi$ . Moreover, $\Phi$ is symmetric and doubly-stochastic. If, additionally, $P$ satisfies BAL, then $\max_{x\perp\boldsymbol{1}}\left\langle{{\Phi x},{x}}\right\rangle\leq 0.$

Proceeding to the operator $\Xi$ , we recall the definition of the space ${\mathcal{E}}$ from (2.7) and we introduce

[TABLE]

Lemma 3.6.

Assume that $P$ satisfies SYM and BAL. The operator $\Xi$ is self-adjoint, $\Xi(\boldsymbol{1}\otimes\boldsymbol{1})=\boldsymbol{1}\otimes\boldsymbol{1}$ and for every $x\in\mathbb{R}^{q}$ we have $\Xi(x\otimes\boldsymbol{1})=(\Phi x)\otimes\boldsymbol{1}$ , $\Xi(\boldsymbol{1}\otimes x)=\boldsymbol{1}\otimes(\Phi x)$ and

[TABLE]

Furthermore, $\Xi{\mathcal{E}}\subset{\mathcal{E}}$ and $\Xi{\mathcal{E}}^{\prime}\subset{\mathcal{E}}^{\prime}$ .

Lemma 3.6 shows that $\Xi$ induces a self-adjoint operator on the space ${\mathcal{E}}$ . The following proposition yields a bound on the spectral radius of this operator. Let

[TABLE]

Proposition 3.7.

If $P$ satisfies SYM and BAL, then $d_{\mathrm{cond}}(k-1)\max_{\lambda\in\mathrm{Eig}^{\ast}([)\Xi]}|\lambda|\leq 1.$

The proof of Proposition 3.7, which is based on highlighting an inherent connection between the spectrum of $\Xi$ and the Bethe free energy functional $\mathcal{B}$ from (2.3), is the main technical achievement of this paper. The details can be found in Section 5. Let us observe that Theorem 2.3 is immediate from Proposition 3.7.

Proof of Theorem 2.3.

We have $\max_{x\in{\mathcal{E}}:\|x\|=1}\left\langle{{\Xi x},{x}}\right\rangle=\max_{\lambda\in\mathrm{Eig}^{\ast}([)\Xi]}|\lambda|$ because Lemma 3.6 shows that $\Xi$ is self-adjoint. Therefore, Theorem 2.3 follows from Proposition 3.7. ∎

Lemma 3.6 and Proposition 3.7 show that (3.10) is satisfied, and thus that $\bar{\rho}$ is a local maximum of (3.9), for all $d<d_{\mathrm{cond}}$ . Indeed, it is immediate from (3.12) that $\left\langle{{(\mathrm{id}-d(k-1)\Xi)x},{x}}\right\rangle>0$ if $x$ is of the form $\boldsymbol{1}\otimes y$ or $y\otimes\boldsymbol{1}$ for some $\boldsymbol{1}\perp y\in\mathbb{R}^{q}$ , and Theorem 2.3 shows that $\left\langle{{(\mathrm{id}-d(k-1)\Xi)x},{x}}\right\rangle>0$ for all $x\in{\mathcal{E}}$ . Hence, Proposition 3.7 provides the link between the free energy calculation for the reweighted model $\hat{\boldsymbol{G}}$ and the second moment of $\mathcal{Z}$ .

3.4. Second moment redux

We begin by deriving the following asymptotic formula for the first moment in Section 7. Observe that by Lemma 3.5 the set $\mathrm{Eig}\left({\Phi}\right)$ of eigenvalues of $\Phi$ contains precisely one non-negative element, namely $1$ . Therefore, the following formula makes sense.

Proposition 3.8.

Suppose that $P$ satisfies SYM and BAL and let $0<d$ . Then uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Proceeding to the second moment, we recall from Lemma 3.6 that $\Xi$ induces an endomorphism on the subspace ${\mathcal{E}}^{\prime}$ from (3.11) and we write

[TABLE]

for the spectrum of $\Xi$ on ${\mathcal{E}}^{\prime}$ . Lemma 3.6 and Proposition 3.7 imply that $d_{\mathrm{cond}}(k-1)\lambda\leq 1$ for all $\lambda\in\mathrm{Eig}^{\prime}(\Xi)$ . Therefore, the following formula for the second moment, whose proof we defer to Section 7, makes sense as well.

Proposition 3.9.

Suppose that $P$ satisfies SYM and BAL and let $0<d<d_{\mathrm{cond}}$ . Then uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Combining Corollary 3.4 with Propositions 3.8 and 3.9 and applying Lemma 3.6, we obtain for $m\in\mathcal{M}(d)$ ,

[TABLE]

In particular, the ratio of the second moment and the square of the first is bounded as $n\to\infty$ .

3.5. Virtuous cycles

In order to determine the limiting distribution of $\ln Z(\boldsymbol{G}(n,m))$ we are going to “explain” the remaining variance of $\mathcal{Z}(\boldsymbol{G}(n,m))$ in terms of the statistics of the bounded-length cycles of $\boldsymbol{G}(n,m)$ . However, by comparison to prior applications of the small subgraph conditioning technique, here it does not suffice to merely record how many cycles of a given length occur. We also need to take into account the specific weight functions along the cycle. Yet this approach is complicated substantially by the fact that there may be infinitely many different weight functions. To deal with this issue we are going to discretize the set of weight functions and perform a somewhat delicate limiting argument.

We need a few definitions. A signature of order $\ell$ is a family

[TABLE]

such that $E_{1},\ldots,E_{\ell}\subset\Psi$ are events, $s_{1},t_{1},\ldots,s_{\ell},t_{\ell}\in\{1,\ldots,k\}$ and $s_{i}\neq t_{i}$ for all $i\in\{1,\ldots,\ell\}$ and $s_{1}<t_{1}$ if $\ell=1$ . Let $\mathcal{Y}_{\ell}$ be the set of all signatures of order $\ell$ , let $\mathcal{Y}_{\leq\ell}=\bigcup_{l\leq\ell}\mathcal{Y}_{l}$ and let $\mathcal{Y}=\bigcup_{\ell\geq 1}\mathcal{Y}_{\ell}$ be the set of all signatures. If $G$ is a factor graph with variable nodes $V_{n}$ and constraint nodes $F_{m}$ , then we call a family $(x_{i_{1}},a_{h_{1}},\ldots,x_{i_{\ell}},a_{h_{\ell}})$ a cycle of signature $Y$ in $G$ if the following conditions are satisfied.

**CYC1: **

$i_{1},\ldots,i_{\ell}\in\{1,\ldots,n\}$ are pairwise distinct and $i_{1}=\min\{i_{1},\ldots,i_{\ell}\}$ ,

**CYC2: **

$h_{1},\ldots,h_{\ell}\in\{1,\ldots,m\}$ are pairwise distinct and $h_{1}<h_{\ell}$ if $\ell>1$ ,

**CYC3: **

$\psi_{a_{h_{j}}}\in E_{j}$ for all $j\in\{1,\ldots,\ell\}$ ,

**CYC4: **

$\partial_{s_{j}}a_{h_{j}}=x_{i_{j}}$ for all $j\in\{1,\ldots,\ell\}$ , $\partial_{t_{j}}a_{h_{j}}=x_{i_{j+1}}$ for all $j<\ell$ and $\partial_{t_{\ell}}a_{h_{\ell}}=x_{i_{1}}$ .

Conditions CYC1– CYC2 provide that the variable nodes that the cycle passes through are pairwise distinct. Moreover, to avoid over-counting CYC1 specifies that the cycle starts at the variable node with the smallest index and CYC2 that from there the cycle is oriented towards the constraint node with the smaller index if $\ell>1$ , respectively that $s_{1}<t_{1}$ if $\ell=1$ . Further, CYC3 states that the weight functions along the cycle belong to $E_{1},\ldots,E_{\ell}$ . Finally, CYC4 ensures that the cycle enters the $j$ th constraint node in position $s_{j}$ and leaves in position $t_{j}$ .

Let $C_{Y}(G)$ denote the number of cycles of signature $Y$ . Moreover, for an event $\mathcal{A}\subset\Psi$ with $\mathbb{P}(\mathcal{A})>0$ and $h,h^{\prime}\in\{1,\ldots,k\}$ define the $q\times q$ matrix $\Phi_{\mathcal{A},h,h^{\prime}}$ by letting

[TABLE]

In addition, for a signature $Y=(E_{1},s_{1},t_{1},\ldots,E_{\ell},s_{\ell},t_{\ell})$ define

[TABLE]

Further, two signatures $Y=(E_{1},s_{1},t_{1},\ldots,E_{\ell},s_{\ell},t_{\ell})$ , $Y^{\prime}=(E_{1}^{\prime},s_{1}^{\prime},t_{1}^{\prime},\ldots,E_{\ell^{\prime}}^{\prime},s_{\ell^{\prime}}^{\prime},t_{\ell^{\prime}}^{\prime})$ are disjoint if either $\ell\neq\ell^{\prime}$ , or $(s_{i},t_{i})\neq(s_{i}^{\prime},t_{i}^{\prime})$ for some $i$ , or $E_{i}\cap E_{i}^{\prime}=\emptyset$ for some $i$ . Finally, a cycle of order $\ell$ is a family $(x_{i_{1}},a_{h_{1}},\ldots,x_{i_{\ell}},a_{h_{\ell}})$ that is a cycle of signature $(\Psi,s_{1},t_{1},\ldots,\Psi,s_{\ell},t_{\ell})$ for some sequence $s_{1},t_{1},\ldots,s_{\ell},t_{\ell}$ , and we let $C_{\ell}$ signify the number of such cycles. The following is a basic fact from the theory of random graphs.

Fact 3.10 ([19]).

Let $\ell_{1},\ldots,\ell_{l}\geq 1$ be pairwise distinct integers and let $y_{1},\ldots,y_{l}\geq 0$ be integers. Then for every $d>0$ uniformly for all $m\in\mathcal{M}(d)$ we have

[TABLE]

and the expected number of pairs of cycles of order at most $\ell_{1}+\cdots+\ell_{l}$ that share a common vertex is $O(1/n)$ .

In Section 8 we establish the following enhancement that takes the weight functions along the cycles into account.

Proposition 3.11.

Suppose that $P$ satisfies SYM and BAL. Let $Y_{1},Y_{2},\ldots Y_{l}\in\mathcal{Y}$ be pairwise disjoint signatures and let $y_{1},\ldots,y_{l}$ be non-negative integers. Let $d>0$ . Then uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Moreover,

[TABLE]

Thus, for disjoint $Y_{1},\ldots,Y_{l}$ the cycle counts $C_{Y_{t}}$ are asymptotically independent Poisson.

Equipped with Propositions 3.8, 3.9 and 3.11, in the case that the set $\Psi$ of weight functions is finite we could determine the limiting distribution of $\ln Z(\boldsymbol{G})$ and thus prove Theorem 2.4 by just applying Janson’s version of the small subgraph conditioning theorem [39]. However, to accommodate an infinite set of weight functions like in the $k$ -spin model a discretization of $\Psi$ and a limiting argument are required. Specifically, recall that

[TABLE]

and for an integer $r\geq 1$ let $\mathfrak{C}_{r}$ be the partition of $\Psi$ induced by slicing the cube $[0,2]^{\Omega^{k}}$ into pairwise disjoint sub-cubes of side length $1/r$ . Further, let $\mathcal{Y}_{\ell,r}$ denote the set of all signatures $(E_{1},s_{1},t_{1},\ldots,E_{\ell},s_{\ell},t_{\ell})$ such that $E_{1},\ldots,E_{\ell}\in\mathfrak{C}_{r}$ and such that $\mathbb{P}(E_{i})>0$ for all $i\leq\ell$ , and define $\mathcal{Y}_{\leq\ell,r}=\bigcup_{l=1}^{\ell}\mathcal{Y}_{l,r}$ . Furthermore, if $\psi\in\Psi$ belongs to a sub-cube $C\in\mathfrak{C}_{r}$ , then we let

[TABLE]

The following proposition, whose proof can be found in Section 9, establishes that the random variable $\mathcal{K}$ from Theorem 2.4 is well-defined and that it can be approximated arbitrarily well via the discretizations $\mathfrak{C}_{r}$ .

Proposition 3.12.

Assume that $P$ satisfies SYM and BAL and let $0<d<d_{\mathrm{cond}}$ . Let $(K_{l})_{l\geq 1}$ be a family of independent Poisson variables with $\mathbb{E}[K_{l}]=(d(k-1))^{l}/(2l)$ and let $(\boldsymbol{\psi}_{l,i,j})_{l,i,j}$ be a family of independent samples from $P$ . Furthermore, define

[TABLE]

and $\mathcal{K}=\sum_{\ell=1}^{\infty}\mathcal{K}_{\ell}$ . Then all $\mathcal{K}_{\ell,r}$ are uniformly bounded in the $L^{1}$ -norm, $\mathcal{K}_{\ell,r}$ is $L^{1}$ -convergent to $\mathcal{K}_{\ell}$ as $r\to\infty$ and $\mathcal{K}_{\ell}$ is $L^{1}$ -convergent to $\mathcal{K}$ as $\ell\to\infty$ . Furthermore,

[TABLE]

3.6. Small subgraph conditioning

We have all the ingredients in place to prove Theorem 2.4. Thus, fix $0<d<d_{\mathrm{cond}}$ and let $m\in\mathcal{M}(d)$ . Let $\mathfrak{F}_{\ell,r}=\mathfrak{F}_{\ell,r}{(n,m)}$ be the $\sigma$ -algebra generated by the cycle counts $(C_{Y})_{Y\in\mathcal{Y}_{\leq\ell,r}}$ . Following the small subgraph conditioning paradigm, we intend to show that for sufficiently large $\ell,r$ , with probability tending to $1$ as $n\to\infty$ , $Z(\boldsymbol{G}(n,m))$ is “close” to $\mathbb{E}[Z(\boldsymbol{G}(n,m))|\mathfrak{F}_{\ell,r}]$ . Since Proposition 3.9 shows that $\mathbb{E}[Z(\boldsymbol{G}(n,m))-\mathcal{Z}(\boldsymbol{G}(n,m))]$ is small and that the second moment of $\mathcal{Z}(\boldsymbol{G}(n,m))$ is under control, we are going to argue via the truncated random variable.

More specifically, to show that $\mathcal{Z}(\boldsymbol{G}(n,m))$ is “close” to $\mathbb{E}[\mathcal{Z}(\boldsymbol{G}(n,m))|\mathfrak{F}_{\ell,r}]$ with probability $1-o(1)$ for sufficiently large $\ell,r$ , we are going to prove that $\mathbb{E}[\mathrm{Var}(\mathbb{E}[\mathcal{Z}(\boldsymbol{G}(n,m))|\mathcal{F}_{\ell,r}])]$ is small. Clearly,

[TABLE]

Hence, to prove that $\mathbb{E}[\mathrm{Var}(\mathbb{E}[\mathcal{Z}(\boldsymbol{G}(n,m))|\mathfrak{F}_{\ell,r}])]$ is small it suffices to show that

[TABLE]

is nearly as big as $\mathrm{Var}[\mathcal{Z}(\boldsymbol{G}(n,m))]$ . Given what we know at this point this is not particularly difficult. Nonetheless, let us put the details off for just a little while to Section 3.7, where we prove the following.

Lemma 3.13.

Suppose that $P$ satisfies SYM and BAL and let $0<d<d_{\mathrm{cond}}$ . For any $\eta>0$ there exists $\ell_{0}(\eta)$ such that for every $\ell>\ell_{0}(\eta)$ there exists $r_{0}(\eta,\ell)$ such that for all $r>r_{0}(\eta,\ell)$ , uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Proof of Theorem 2.4.

Because $\mathcal{Z}(\boldsymbol{G}(n,m))\leq Z(\boldsymbol{G}(n,m))$ and $\mathbb{E}[\mathcal{Z}(\boldsymbol{G}(n,m))]\sim\mathbb{E}[Z(\boldsymbol{G}(n,m))]$ by Corollary 3.4, we have $\mathbb{E}|\mathcal{Z}(\boldsymbol{G}(n,m))-Z(\boldsymbol{G}(n,m))|=o(\mathbb{E}[Z(\boldsymbol{G}(n,m))]).$ Therefore, Lemma 3.13 implies that

[TABLE]

Thus, we are left to determine the law of $\mathbb{E}[Z(\boldsymbol{G}(n,m))|\mathfrak{F}_{\ell,r}]$ . On this count, Proposition 3.11 shows that for any non-negative integer vector $(c_{Y})_{Y\in\mathcal{Y}_{\leq\ell,r}}$ ,

[TABLE]

Hence, letting $K_{\ell,r}^{\prime}(\boldsymbol{G}(n,m))=\sum_{Y\in\mathcal{Y}_{\leq\ell,r}}C_{Y}(\boldsymbol{G}(n,m))\ln(\operatorname{tr}\Phi_{Y})-(\hat{\kappa}_{Y}-\kappa_{Y})$ we conclude that, in distribution,

[TABLE]

Further, by (3.18)

[TABLE]

Thus, combining Propositions 3.11 and 3.12, we conclude that $K_{\ell,r}^{\prime}(\boldsymbol{G}(n,m))$ converges to $\mathcal{K}_{\ell,r}$ in distribution as $n\to\infty$ for every $\ell,r$ . Hence, due to (3.24) so does $K_{\ell,r}(\boldsymbol{G}(n,m))$ . Consequently, Proposition 3.12 and (3.23) show that for any bounded continuous function $g:\mathbb{R}\to\mathbb{R}$ ,

[TABLE]

Combining these two statements and observing that the first and the last term are independent of $\ell,r$ , we obtain

[TABLE]

i.e., $\ln Z(\boldsymbol{G}(n,m))-\ln\mathbb{E}[Z(\boldsymbol{G}(n,m))]$ converges to $\mathcal{K}$ in distribution. Plugging in the formula for the first moment from (3.14) yields (2.11). Finally, because Proposition 3.11 shows that

[TABLE]

the formula for the conditional free energy given $\mathfrak{S}$ follows from (2.11) and Lemma 3.13. ∎

Organization

The paper is organized as follows. After proving Lemma 3.13 in Section 3.7, in Section 4 we collect some preliminaries, introduce notation, supply the proofs of Lemmas 3.5 and 3.6 and show how Theorem 2.5, Theorem 2.6 and Corollary 2.7 follow from Theorem 2.4. Because we consider the proof of Proposition 3.7 the main technical achievement of this work, the proof is self-contained, and as we deem the argument rather interesting, that proof follows in Section 5. Further, Section 6 contains the proof of Proposition 3.3, which is by way of a (substantial) generalization of an argument from [23] for the Potts antiferromagnet. Subsequently Section 7 contains the proofs of Proposition 3.8 and Proposition 3.9 about the moments of the truncated variable $\mathcal{Z}$ . Moreover, Section 8 deals with the proof of Proposition 3.11. The somewhat delicate proof of Proposition 3.12 can be found in Section 9. Section 10 contains the rather technical proofs of Theorem 2.2 and Theorem 3.2. Finally, the proof of Theorem 2.8 about the reconstruction problem can be found in Section 11.

3.7. Proof of Lemma 3.13

The proof is by generalization of the argument from [24, Section 2] for the random regular $k$ -SAT model to the current setting of random factor graph models. We begin with the following lower bound on the second moment of the conditional expectation. Let $\delta_{Y}=\operatorname{tr}(\Phi_{Y})-1=(\hat{\kappa}_{Y}-\kappa_{Y})/\kappa_{Y}$ .

Lemma 3.14.

Suppose that $P$ satisfies SYM and BAL and let $0<d<d_{\mathrm{cond}}$ , $\ell,r>0$ . Then uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Proof.

Fix a number $\alpha>0$ , choose $B=B(\alpha,\ell,r)$ sufficiently large and let $\Gamma=\Gamma(\ell,r,B)$ be the set of all families $(c_{Y})_{Y\in\mathcal{Y}_{\leq\ell,r}}$ of non-negative integers such that $\sum_{Y\in\mathcal{Y}_{\leq\ell,r}}c_{Y}\leq B$ . Moreover, let ${\mathcal{C}}={\mathcal{C}}(\ell,r,B)$ be the event that $(C_{Y}(\boldsymbol{G}(n,m)))_{Y\in\mathcal{Y}_{\leq\ell,r}}\in\Gamma$ . Then (3.6) and Proposition 3.11 yield

[TABLE]

Let $S=\sum_{Y\in\mathcal{Y}_{\leq\ell,r}}(1+\delta_{Y})^{2}\kappa_{Y}$ . Since the matrices $\Phi_{\psi}$ are stochastic, (3.18) shows that there is a number $T(\ell)$ such that $S\leq T(\ell)$ . Therefore, choosing $B=B(\alpha,\ell,d)$ sufficiently large, we can ensure that $\exp(S)\leq\exp(\alpha)\sum_{L\leq B}S^{L}/L!$ . Hence,

[TABLE]

Combining (3.25) and (3.26), we find

[TABLE]

Finally, we need to show that $Z(\boldsymbol{G}(n,m))$ can be replaced by $\mathcal{Z}(\boldsymbol{G}(n,m))$ on the l.h.s. of (3.27). Since $Z(\boldsymbol{G}(n,m))\geq\mathcal{Z}(\boldsymbol{G}(n,m))$ but $\mathbb{E}[Z(\boldsymbol{G}(n,m))]\sim\mathbb{E}[\mathcal{Z}(\boldsymbol{G}(n,m))]$ , we have

[TABLE]

To bound $\|\boldsymbol{1}\{{\mathcal{C}}\}\mathbb{E}[Z(\boldsymbol{G}(n,m))|\mathfrak{F}_{\ell,r}]\|_{\infty}$ we observe that for all $(c_{Y})_{Y}\in\Gamma$ ,

[TABLE]

Hence, $\|\boldsymbol{1}\{{\mathcal{C}}\}\mathbb{E}[Z(\boldsymbol{G}(n,m))|\mathfrak{F}_{\ell,r}]\|_{\infty}=O(\mathbb{E}[Z(\boldsymbol{G}(n,m))])$ and the assertion follows from (3.27) and (3.28) by taking $\alpha\to 0$ sufficiently slowly as $n\to\infty$ . ∎

Proof of Lemma 3.13.

We use a similar trick as in the proof of [24, Corollary 2.6]. Recall that aim to show that

[TABLE]

Given $\eta>0$ choose $\alpha=\alpha(\eta)>0$ small enough. Then by (3.21), (3.22) and Lemma 3.14 and (3.20), for sufficiently $\ell,r,n$ we have

[TABLE]

Now define

[TABLE]

Then

[TABLE]

Furthermore, by Chebyshev’s inequality

[TABLE]

Combining (3.30) and (3.32), we obtain

[TABLE]

Finally, (3.29) follows from (3.31), (3.33) and Markov’s inequality. ∎

4. Getting started

4.1. Basics

Throughout the paper we continue to use the notation introduced in Sections 2 and 3. In particular, we write $V_{n}=\{x_{1},\ldots,x_{n}\}$ for a set of $n$ variable nodes and $F_{m}=\{a_{1},\ldots,a_{m}\}$ for a set of $m$ constraint nodes. Further, $\boldsymbol{m}_{d}(n)$ is a random variable with distribution ${\rm Po}(dn/k)$ and we just write $\boldsymbol{m}_{d}$ or $\boldsymbol{m}$ if $n$ and/or $d$ are apparent. Moreover, for an integer $l\geq 1$ we let $[l]=\{1,\ldots,l\}$ .

For a finite set $\mathcal{X}$ we denote the set of probability distributions on $\mathcal{X}$ by $\mathcal{P}(\mathcal{X})$ . We identify $\mathcal{P}(\mathcal{X})$ with the standard simplex in $\mathbb{R}^{\mathcal{X}}$ and endow $\mathcal{P}(\mathcal{X})$ accordingly with the Borel $\sigma$ -algebra. By $\mathcal{P}^{2}(\mathcal{X})$ we denote the set of probability measures on $\mathcal{P}(\mathcal{X})$ and by $\mathcal{P}^{2}_{*}(\mathcal{X})$ the set of all $\pi\in\mathcal{P}^{2}(\mathcal{X})$ whose mean $\int_{\mathcal{P}(\mathcal{X})}\mu{\mathrm{d}}\pi(\mu)$ is the uniform distribution on $\mathcal{X}$ . In addition, for a point $x$ in a measurable space we write $\delta_{x}$ for the Dirac measure on $x$ . The entropy of a probability distribution $\mu$ on a finite set $\mathcal{X}$ is always denoted by $\mathcal{H}(\mu)$ . Thus, recalling that $\Lambda(z)=z\ln z$ for $z>0$ and setting $\Lambda(0)=0$ , we have $\mathcal{H}(\mu)=-\sum_{x\in\mathcal{X}}\Lambda(\mu(x)).$

Further, if $\mu\in\mathcal{P}(\Omega^{V_{n}})$ is a probability measure on the discrete cube $\Omega^{V_{n}}$ , then $\boldsymbol{\sigma}_{\mu},\boldsymbol{\tau}_{\mu},\boldsymbol{\sigma}_{1,\mu},\boldsymbol{\sigma}_{2,\mu},\ldots\in\Omega^{V_{n}}$ denote mutually independent samples from $\mu$ . If $\mu=\mu_{G}$ is the Gibbs measure induced by a factor graph $G$ , we write $\boldsymbol{\sigma}_{G}$ etc. instead of $\boldsymbol{\sigma}_{\mu_{G}}$ . Where $\mu$ or $G$ are apparent from the context we omit the index and just write $\boldsymbol{\sigma},\boldsymbol{\tau}$ , etc. If $X:(\Omega^{V_{n}})^{l}\to\mathbb{R}$ is a random variable, then we use the notation

[TABLE]

Thus, $\left\langle{X}\right\rangle_{\mu}$ is the mean of $X$ over independent samples from $\mu$ . If $\mu=\mu_{G}$ for a factor graph $G$ , then we simplify the notation by writing $\left\langle{\,\cdot\,}\right\rangle_{G}$ rather than $\left\langle{\,\cdot\,}\right\rangle_{\mu_{G}}$ . We use this notation to distinguish averages over $\mu_{G}$ from other sources of randomness (e.g., the choice of the random factor graph), for which we reserve the symbols $\mathbb{E}\left[{\,\cdot\,}\right]$ and $\mathrm{Var}\left[{\,\cdot\,}\right]$ .

Finally, we need a few facts about probability distributions on sets of the form $\Omega^{l}$ . For $\sigma_{1},\ldots,\sigma_{l}:V\to\Omega$ let $\rho_{\sigma_{1},\ldots,\sigma_{l}}\in\mathcal{P}(\Omega^{l})$ denote the $l$ -wise overlap, defined by

[TABLE]

We use this notation also in the case $l=1$ and observe that $\rho_{\sigma_{1}}$ is nothing but the empirical distribution of the spins under $\sigma_{1}$ . Further, we let $\bar{\rho}_{l}$ signify the uniform distribution on $\Omega^{l}$ ; we usually omit the index $l$ to ease the notation. For two spin assignments $\sigma,\tau:V\to\Omega$ we let $\sigma\triangle\tau=\{v\in V:\sigma(v)\neq\tau(v)\}.$

Lemma 4.1 ([13]).

For any finite set $\Omega$ , any $\varepsilon>0$ and any $l\geq 3$ there exist $\delta=\delta(\Omega,\varepsilon,l)$ and $n_{0}=n_{0}(\Omega,\varepsilon,l)$ such that for all $n>n_{0}$ and all $\mu\in\mathcal{P}(\Omega^{V_{n}})$ the following is true: if $\left\langle{\left\|{\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}}\right\|_{\mathrm{TV}}}\right\rangle<\delta$ , then $\left\langle{\left\|{\rho_{\boldsymbol{\sigma}_{1},\ldots,\boldsymbol{\sigma}_{l}}-\bar{\rho}_{l}}\right\|_{\mathrm{TV}}}\right\rangle<\varepsilon$ .

Call $\sigma\in\Omega^{V_{n}}$ nearly balanced if $\left\|{\rho_{\sigma}-\bar{\rho}}\right\|_{\mathrm{TV}}\leq n^{-2/5}$ .

Lemma 4.2 ([23, Lemma 4.7]).

For any $\varepsilon>0$ there is $\delta>0$ such that for all sufficiently large $n$ the following is true. If $\mu\in\mathcal{P}(\Omega^{n})$ satisfies $\left\langle{\left\|{\rho_{\boldsymbol{\sigma},\boldsymbol{\tau}}-\bar{\rho}}\right\|_{\mathrm{TV}}}\right\rangle_{\mu}<\delta$ , then for all nearly balanced $\tau$ we have $\left\langle{\left\|{\rho_{\boldsymbol{\sigma},\tau}-\bar{\rho}}\right\|_{\mathrm{TV}}}\right\rangle_{\mu}<\varepsilon$ .

Finally, we need the following elementary observation.

Fact 4.3.

For any finite set $\Omega$ and any $\varepsilon>0$ there is $\delta>0$ such that the following holds. If $\rho=(\rho(s,t))_{s,t\in\Omega}\in\mathcal{P}(\Omega^{2})$ satisfies

[TABLE]

then there exists $\rho^{\prime}\in\mathcal{P}(\Omega^{2})$ such that $\left\|{\rho-\rho^{\prime}}\right\|_{\mathrm{TV}}<\varepsilon$ and $\sum_{t\in\Omega}\rho^{\prime}(s,t)=\sum_{t\in\Omega}\rho^{\prime}(t,s)=1/q\mbox{ for all }s\in\Omega.$

4.2. The Nishimori identity

There exists an important distributional relationship between the teacher-student model $\boldsymbol{G}^{*}(n,m,P,\sigma)$ and the reweighted random graph model $\hat{\boldsymbol{G}}(n,m,P)$ from (3.6) (cf. [67] for a discussion from the physics viewpoint). To state this connection, we need to define an appropriately reweighted distribution on the set $\Omega^{V_{n}}$ of spin assignments. Specifically, we let $\hat{\boldsymbol{\sigma}}_{n,m,P}\in\Omega^{V_{n}}$ be a random assignment chosen from the distribution

[TABLE]

As before we skip the index $P$ where possible. We refer to the following statement as the Nishimori identity.

Lemma 4.4 ([23, Proposition 3.10]).

For every distribution $P$ on weight functions $\Omega^{k}\to(0,2)$ , for all integers $n,m$ , for every $\sigma\in\Omega^{V_{n}}$ and for every event $\mathcal{A}$ we have

[TABLE]

A useful consequence of this result is that $\mathbb{E}[\mathcal{X}(\boldsymbol{G}^{*}(n,\boldsymbol{m},\hat{\boldsymbol{\sigma}}_{n,m}),\hat{\boldsymbol{\sigma}}_{n,m})]=\mathbb{E}\left\langle{\mathcal{X}(\hat{\boldsymbol{G}},\boldsymbol{\sigma})}\right\rangle_{\hat{\boldsymbol{G}}}$ for every $L^{1}$ -function $\mathcal{X}$ .

4.3. Eigenvalues

The vector or matrix with all entries equal to one (in any dimension) is signified by $\boldsymbol{1}$ . The transpose of a matrix $A$ we denote by $A^{\ast}$ . Additionally, $\mathrm{id}$ denotes the identity matrix (in any dimension). Further, the standard basis vectors on $\mathbb{R}^{\Omega}$ are denoted by $e_{\omega}$ , $\omega\in\Omega$ . For the entries of a matrix $A\in\mathbb{R}^{\Omega\times\Omega}$ we use the notation $A(\sigma,\tau)$ ; thus, $A(\sigma,\tau)=\left\langle{{Ae_{\tau}},{e_{\sigma}}}\right\rangle$ for all $\sigma,\tau\in\Omega$ . The spectrum of a linear operator $X:E\to E^{\prime}$ is denoted by $\mathrm{Eig}(X)$ .

The following simple observation will be used several times. Recall $\Phi$ from (2.9).

Lemma 4.5.

Assume that $P$ satisfies SYM. Then the function

[TABLE]

satisfies $D\phi(\bar{\rho})=k\xi\boldsymbol{1}$ , $D^{2}\phi(\bar{\rho})=qk(k-1)\xi\Phi$ and $\phi$ is bounded away from [math].

Proof.

Since $\frac{\partial\phi}{\partial\rho(\omega)}=\sum_{j=1}^{k}\sum_{\tau\in\Omega^{k}}\boldsymbol{1}\{\tau_{j}=\omega\}\mathbb{E}[\boldsymbol{\psi}(\tau)]\prod_{i\neq j}\rho(\tau_{i})$ for every $\omega\in\Omega$ , SYM immediately yields $D\phi(\bar{\rho})=k\xi\boldsymbol{1}$ . Proceeding to the second derivatives, we find

[TABLE]

Consequently, SYM yields $D^{2}\phi(\bar{\rho})=qk(k-1)\xi\Phi$ . Finally, the fact that $\inf_{\rho\in\mathcal{P}(\Omega)}\phi(\rho)>0$ follows from (2.1). ∎

As an immediate application we prove Lemmas 3.5 and 3.6.

Proof of Lemma 3.5.

Condition SYM readily implies that $\Phi_{\psi}$ is stochastic for every $\psi\in\Psi$ . Hence, $\Phi_{\psi}\boldsymbol{1}=\boldsymbol{1}$ for all $\psi\in\Psi$ and consequently $\Phi\boldsymbol{1}=\boldsymbol{1}$ . To see that $\Phi$ is symmetric let $\theta$ be the permutation on $\{1,\ldots,k\}$ such that $\theta(1)=2$ , $\theta(2)=1$ and $\theta(i)=i$ for all $i>2$ . Since SYM implies that $\boldsymbol{\psi}$ and $\boldsymbol{\psi}^{\theta}$ are identically distributed, we obtain

[TABLE]

To verify the last assertion, consider the function $\phi$ from (4.4). Condition BAL ensures that $\phi$ is concave on the set $\mathcal{P}(\Omega)$ of probability measures on $\Omega$ . Since by Lemma 4.5 the Hessian satisfies $D^{2}\phi(\bar{\rho})=qk(k-1)\xi\Phi$ , we see that $\Phi$ induces a negative semidefinite endomorphism of the subspace $\{x\in\mathbb{R}^{q}:x\perp\boldsymbol{1}\}$ . Hence, $\max_{x\perp\boldsymbol{1}}\left\langle{{\Phi x},{x}}\right\rangle\leq 0$ . ∎

Proof of Lemma 3.6.

To see that $\Xi$ is self-adjoint let $(e_{\omega})_{\omega\in\Omega}$ be the canonical basis of $\mathbb{R}^{\Omega}$ and let $\theta$ be the permutation on $\{1,\ldots,k\}$ such that $\theta(1)=2$ , $\theta(2)=1$ and $\theta(i)=i$ for all $i>2$ . Then for all $s,t,\sigma,\tau\in\Omega$ we have

[TABLE]

Since $(e_{s}\otimes e_{t})_{s,t\in\Omega}$ is a basis of $\mathbb{R}^{\Omega}\otimes\mathbb{R}^{\Omega}$ , (4.5) shows that $\Xi$ is self-adjoint.

Furthermore, since $\Phi_{\psi}\boldsymbol{1}=\boldsymbol{1}$ for all $\psi\in\Psi$ by Lemma 3.5, we see that $\Xi(x\otimes\boldsymbol{1})=\mathbb{E}[\Phi_{\boldsymbol{\psi}}x\otimes\Phi_{\boldsymbol{\psi}}\boldsymbol{1}]=(\Phi x)\otimes\boldsymbol{1}.$ Similarly, $\Xi(\boldsymbol{1}\otimes x)=\boldsymbol{1}\otimes(\Phi x)$ and thus (3.12) follows from Lemma 3.5. In particular, since $\Phi\boldsymbol{1}=\boldsymbol{1}$ by Lemma 3.5 we obtain $\Xi(\boldsymbol{1}\otimes\boldsymbol{1})=\boldsymbol{1}\otimes\boldsymbol{1}$ . Because $\Xi$ is self-adjoint, this implies that $\Xi{\mathcal{E}}^{\prime}\subset{\mathcal{E}}^{\prime}$ . Finally, assume that $z\in{\mathcal{E}}$ . Then for all $y\in\mathbb{R}^{q}$ we have $\left\langle{{\Xi z},{y\otimes\boldsymbol{1}}}\right\rangle=\left\langle{{z},{\Xi(y\otimes\boldsymbol{1})}}\right\rangle=\left\langle{{z},{(\Phi y)\otimes\boldsymbol{1}}}\right\rangle=0,$ and analogously $\left\langle{{\Xi z},{\boldsymbol{1}\otimes y}}\right\rangle=0$ . Hence, $\Xi{\mathcal{E}}\subset{\mathcal{E}}$ . ∎

4.4. Contiguity

Throughout the paper we apply contiguity between several probability spaces. Some of these contiguity results derive from the following first moment calculation, which also delivers the proof of (3.5).

Lemma 4.6.

Suppose that $P$ satisfies SYM and BAL. For any $D>0$ there exists $c>0$ such that for all $m\leq Dn/k$ ,

[TABLE]

Moreover, for any $\sigma\in\Omega^{V_{n}}$ we have, uniformly for all $m\leq Dn/k$ ,

[TABLE]

Proof.

By the linearity of expectation and because the constraint nodes of $\boldsymbol{G}(n,m)$ are chosen independently,

[TABLE]

Since SYM and BAL provide that $\phi(\rho_{\sigma})\leq\xi$ for every $\sigma$ , the upper bound $\mathbb{E}[Z(\boldsymbol{G}(n,m))]\leq q^{n}\xi^{m}$ is immediate. With respect to the lower bound, recall that the number of $\sigma:V_{n}\to\Omega$ such that $\left\|{\rho_{\sigma}-\bar{\rho}}\right\|_{\mathrm{TV}}\leq n^{-1/2}$ is of order $\Omega(q^{n})$ . Hence, applying Lemma 4.5, we see that for such $\sigma$ ,

[TABLE]

Thus, $\mathbb{E}[Z(\boldsymbol{G}(n,m))]\geq\Omega(q^{n})(\phi(\bar{\rho})+O(1/n))^{m}=\Omega(q^{n}\xi^{m})$ , uniformly for all $m\leq Dn/k$ . Finally, (4.6) follows from because $\mathbb{E}|\ln Z(\boldsymbol{G}(n,m))|\leq m\mathbb{E}[\max_{\tau\in\Omega^{k}}|\ln\boldsymbol{\psi}(\tau)|]=O(n)$ due to (2.1) and the independence of the constraint nodes, and similarly $\mathbb{E}|\ln Z(\boldsymbol{G}^{*}(n,m,P,\sigma))|\leq 2m\mathbb{E}\left[{\max_{\tau\in\Omega^{k}}|\ln\boldsymbol{\psi}(\tau)|}\right]/\phi(\rho_{\sigma})=O(n)$ by Lemma 4.5 and (2.1). ∎

Corollary 4.7.

Assume that $P$ satisfies SYM and BAL and let $D>0$ . Then uniformly for all $m\leq Dn/k$ ,

[TABLE]

and the distribution of $\hat{\boldsymbol{\sigma}}_{n,m}$ and that of $\mathbold{\sigma}^{*}$ are mutually contiguous. Additionally, for any $\varepsilon>0$ there exists $c=c(\varepsilon,D)>0$ such that

[TABLE]

Proof.

The bound (4.8) and the mutual contiguity of $\hat{\boldsymbol{\sigma}}_{n,m}$ and the uniformly random $\boldsymbol{\sigma}^{*}$ follow from [23, Corollary 3.27]. With respect to (4.9) BAL, SYM and Lemma 4.6 ensure there is $c^{\prime}=c^{\prime}(D)>0$ such that for every $c>0$ ,

[TABLE]

By Stirling we can choose $c=c(\varepsilon)>0$ large enough so that the last expression is smaller than $\varepsilon>0$ . ∎

Corollary 4.8.

Assume that $P$ satisfies SYM and BAL, let $d>0$ and let $(\mathcal{S}_{n})_{n}$ be a sequence of events. Then the following two statements are true.

[TABLE]

Proof.

Fix $m\in\mathcal{M}(d)$ . By Lemma 4.4, BAL and Lemma 4.6,

[TABLE]

which implies (4.10). To prove (4.11) pick $L=L(\varepsilon)>0$ large enough so that $\mathbb{P}\left[{\|{\rho_{\boldsymbol{\sigma}^{*}}-\bar{\rho}}\|_{\mathrm{TV}}>Ln^{-1/2}}\right]<\varepsilon/2$ . Then Lemma 4.5 shows that there exists $\eta=\eta(L)>0$ such that $\mathbb{E}[\psi_{\boldsymbol{G}(n,m)}(\sigma)]=\phi(\rho_{\sigma})^{m}\geq\eta\xi^{m}$ for all $\sigma\in\Omega^{V_{n}}$ such that $\|{\rho_{\sigma}-\bar{\rho}}\|_{\mathrm{TV}}\leq Ln^{-1/2}$ . Hence, by Lemmas 4.4 and 4.6,

[TABLE]

Thus, setting $\delta=\varepsilon\eta/3$ , we obtain (4.11). ∎

Proof of Lemma 3.1.

By construction, the mutual contiguity of $\boldsymbol{G}^{*}(n,m,\boldsymbol{\sigma}^{*})$ and $\boldsymbol{G}^{*}(n,m,\hat{\boldsymbol{\sigma}}_{n,m})$ is immediate from the mutual contiguity of $\boldsymbol{\sigma}^{*}$ and $\hat{\boldsymbol{\sigma}}_{n,m}$ furnished by Corollary 4.7. Moreover, $\hat{\boldsymbol{G}}(n,m)$ and $\boldsymbol{G}^{*}(n,m,\hat{\boldsymbol{\sigma}}_{n,m})$ are identically distributed by the Nishimori identity. ∎

Finally, we derive Theorem 2.6, Corollary 2.7 and Theorem 2.5 from Theorem 2.4.

Proof of Theorem 2.6.

Suppose that $d<d_{\mathrm{cond}}$ and that $(\mathcal{S}_{n})_{n}$ is a sequence of events. We will prove the following two statements, from which the mutual contiguity of $\boldsymbol{G}$ and $\hat{\boldsymbol{G}}$ is immediate.

[TABLE]

Since $\hat{\boldsymbol{G}}$ and $\boldsymbol{G}^{*}$ are mutually contiguous by Lemma 3.1, mutual contiguity of $\boldsymbol{G}$ and $\boldsymbol{G}^{*}$ follows from (4.13) and (4.14). Moreover, the conditional mutual contiguity given $\mathfrak{S}$ follows by applying the unconditional result to $\mathcal{S}_{n}\cap\mathfrak{S}$ , because Lemma 3.1 and Proposition 3.11 show that the probability of $\mathfrak{S}$ is bounded away from [math] in either model.

We proceed to prove (4.13). Because the random variable $\mathcal{K}$ from Theorem 2.4 satisfies $\mathbb{E}\left|{\mathcal{K}}\right|<\infty$ , there exists $\delta>0$ such that $\mathbb{E}[\mathbb{P}\left[{Z(\boldsymbol{G})<\delta\mathbb{E}[Z(\boldsymbol{G})|\boldsymbol{m}]|\boldsymbol{m}}\right]]<\varepsilon/2.$ Hence,

[TABLE]

Thus, setting $\alpha=\delta\varepsilon/2$ , we obtain (4.13).

Let us move on to the proof of (4.14). Proposition 3.9 shows that for every $d<d_{\mathrm{cond}}$ there is $c(d)>0$ such that uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Hence, by Markov’s inequality for any $\varepsilon>0$ there is $L>0$ such that $\mathbb{P}[\mathcal{Z}(\hat{\boldsymbol{G}}(n,m))>L\cdot\mathbb{E}[Z(\boldsymbol{G}(n,m))]]<\varepsilon/2.$ Moreover, $\mathbb{P}[\mathcal{Z}(\hat{\boldsymbol{G}}(n,m))=Z(\hat{\boldsymbol{G}}(n,m))]=1-o(1)$ by Proposition 3.3. As a consequence,

[TABLE]

Thus, choosing $\alpha<\varepsilon/(3L)$ , say, we obtain (4.14). ∎

Proof of Corollary 2.7.

The corollary is immediate from Theorem 2.6, Lemma 4.4 and Corollary 4.7. ∎

Proof of Theorem 2.5.

Theorem 2.6 and Proposition 3.3 imply that $\lim_{n\to\infty}\mathbb{E}\left\langle{\|\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}\|_{\mathrm{TV}}}\right\rangle_{\boldsymbol{G}}=0$ for all $d<d_{\mathrm{cond}}$ . To prove that this fails to hold for $d$ beyond but arbitrarily close to $d_{\mathrm{cond}}$ , we calculate the derivative $\frac{\partial}{\partial d}\mathbb{E}[\ln Z(\boldsymbol{G})]$ (for the random graph coloring problem a similar argument was used in [21]). It is well known that

[TABLE]

Expanding the logarithm using Fubini and (2.1), we find

[TABLE]

Further with $\rho_{\boldsymbol{\sigma}_{1},\ldots,\boldsymbol{\sigma}_{l}}$ denoting the overlap of $l$ independent samples from $\mu_{\boldsymbol{G}}$ as in (4.1), we can cast (4.16) as

[TABLE]

Hence, if $\lim_{n\to\infty}\mathbb{E}\left\langle{\|\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}\|_{\mathrm{TV}}}\right\rangle_{\boldsymbol{G}}=0$ , then due to (2.1), dominated convergence and Lemma 4.1

[TABLE]

Now, suppose that $D>0$ is such that $\mathbb{E}\left\langle{\|\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}\|_{\mathrm{TV}}}\right\rangle_{\boldsymbol{G}}=o(1)$ for all $d<D$ . Then (2.1), dominated convergence and (4.17) yield

[TABLE]

Thus, Theorem 2.2 shows that $D\leq d_{\mathrm{cond}}$ . Consequently, for any $D>d_{\mathrm{cond}}$ there exists an average degree $d<D$ such that $\limsup_{n\to\infty}\mathbb{E}\left\langle{\|\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}\|_{\mathrm{TV}}}\right\rangle_{\boldsymbol{G}}>0,$ as claimed. The very same argument applies given $\mathfrak{S}$ . ∎

As a preparation for Section 11 we put the following on record.

Corollary 4.9.

Assume that $P$ satisfies SYM and BAL and that $d<d_{\mathrm{cond}}$ . Then for any sequence $(\mathcal{S}_{n})_{n}$ of events the following two statements hold.

[TABLE]

Proof.

To prove (4.18) pick a small enough $\eta=\eta(\varepsilon)>0$ and a smaller $\delta=\delta(\eta)>0$ . Then Corollary 4.8 shows that $\limsup_{n\to\infty}\mathbb{P}\left[{(\boldsymbol{G}^{*},\boldsymbol{\sigma}^{*})\in\mathcal{S}_{n}}\right]<\delta$ implies $\limsup_{n\to\infty}\mathbb{P}\left[{(\hat{\boldsymbol{G}},\boldsymbol{\sigma}_{\hat{\boldsymbol{G}}})\in\mathcal{S}_{n}}\right]<\eta$ . Hence,

[TABLE]

and thus (4.13) implies $\limsup_{n\to\infty}\mathbb{P}\left[{\left\langle{\boldsymbol{1}\{(\boldsymbol{G},\boldsymbol{\sigma})\in\mathcal{S}_{n}\}}\right\rangle_{\boldsymbol{G}}\geq\varepsilon}\right]<\varepsilon$ , which proves (4.18).

Similarly, to obtain (4.19) choose $\eta=\eta(\varepsilon)>0$ and $\delta=\delta(\eta)>0$ sufficiently small. If $\limsup\mathbb{P}\left[{(\boldsymbol{G},\boldsymbol{\sigma})\in\mathcal{S}_{n}}\right]<\delta$ , then (4.14) yields $\limsup_{n\to\infty}\mathbb{P}\left[{(\hat{\boldsymbol{G}},\boldsymbol{\sigma}_{\hat{\boldsymbol{G}}})\in\mathcal{S}_{n}}\right]<\eta.$ Hence, (4.10) implies $\limsup_{n\to\infty}\mathbb{P}\left[{(\boldsymbol{G}^{*},\boldsymbol{\sigma}^{*})\in\mathcal{S}_{n}}\right]<\varepsilon$ . ∎

5. The Kesten-Stigum bound

Throughout this section we assume that $P$ satisfies SYM and BAL.

5.1. Outline

In this section we prove Proposition 3.7. The key insight is that the dominant eigenvector of $\Xi$ restricted to the space ${\mathcal{E}}$ gives rise to a natural family of probability distributions $\pi_{\varepsilon}\in\mathcal{P}^{2}_{*}(\Omega)$ , $\varepsilon>0$ . Up to an error term that decays as $\varepsilon\to 0$ , the Bethe free energy $\mathcal{B}(d,P,\pi_{\varepsilon})$ of this distribution is given by a quadratic function of the corresponding eigenvalue. Ultimately, the desired bound on $\max\{\left|{\lambda}\right|:\lambda\in\mathrm{Eig}^{\ast}([)\Xi]\}$ follows because the definition (2.4) of $d_{\mathrm{cond}}$ ensures that $\mathcal{B}(d,P,\pi_{\varepsilon})\leq\ln q+\frac{d}{k}\ln\xi$ for all $d<d_{\mathrm{cond}}$ , $\varepsilon>0$ . To implement this programme we need to show that the dominant eigenvector of $\Xi$ has a particular form. More precisely, in Section 5.2 we prove

Lemma 5.1.

Let $\hat{\lambda}=\max\mathrm{Eig}^{\ast}([)\Xi]$ . Then $\hat{\lambda}\geq-\min\mathrm{Eig}^{\ast}([)\Xi]$ and there exists an orthonormal basis $u_{1},\ldots,u_{q-1}\in\mathbb{R}^{\Omega}$ of the space $\{x\in\mathbb{R}^{\Omega}:x\perp\boldsymbol{1}\}$ and $\bar{\lambda}_{1},\ldots,\bar{\lambda}_{q-1}\geq 0$ such that

[TABLE]

is a unit vector and $\Xi\Sigma=\hat{\lambda}\Sigma$ .

Throughout this section we denote the eigenvector promised by Lemma 5.1 by $\Sigma$ and the corresponding eigenvalue by $\hat{\lambda}$ . The particular structure of $\Sigma$ ensures that

[TABLE]

Further, because the coefficients $\bar{\lambda}_{i}$ in (5.1) are non-negative and $u_{1},\ldots,u_{q-1}\perp\boldsymbol{1}$ , we obtain

[TABLE]

Recalling that $(e_{\omega})_{\omega\in\Omega}$ is the canonical basis of $\mathbb{R}^{\Omega}$ , for each $\omega\in\Omega$ we define $\pi_{\varepsilon,\omega}\in\mathbb{R}^{\Omega}$ by letting

[TABLE]

Finally, let $\pi_{\varepsilon}=\frac{1}{q}\sum_{\omega\in\Omega}\delta_{\pi_{\varepsilon,\omega}}$ (with $\delta_{z}$ the Dirac measure on $z\in\mathbb{R}^{\Omega}$ ).

Lemma 5.2.

There exists $\varepsilon_{0}>0$ such that for all $0<\varepsilon<\varepsilon_{0}$ we have $\pi_{\varepsilon,\omega}\in\mathcal{P}(\Omega)$ for all $\omega\in\Omega$ and $\pi_{\varepsilon}\in\mathcal{P}^{2}_{*}(\Omega)$ .

Proof.

Clearly, $\pi_{\varepsilon,\omega}(\sigma)\geq 0$ for all $\sigma,\omega\in\Omega$ for small enough $\varepsilon>0$ . Moreover, since $\eta\in{\mathcal{E}}$ by (5.3),

[TABLE]

Hence, $\pi_{\varepsilon,\omega}\in\mathcal{P}(\Omega)$ and $\pi_{\varepsilon}\in\mathcal{P}^{2}(\Omega)$ . Similarly, once more because $\eta\in{\mathcal{E}}$ , for each $\sigma\in\Omega$ we have

[TABLE]

whence $\pi_{\varepsilon}\in\mathcal{P}^{2}_{*}(\Omega)$ . ∎

Our next goal is to calculate $\mathcal{B}(d,P,\pi_{\varepsilon})$ . More precisely, we aim to expand $\mathcal{B}(d,P,\pi_{\varepsilon})$ to the fourth order in the limit $\varepsilon\to 0$ . The key tool for this expansion is the following elementary lemma, whose proof can be found in Section 5.3.

Lemma 5.3.

Suppose $\ell\geq 1$ and that $F:\mathcal{P}(\Omega)^{\ell}\to(0,\infty)$ , $(\rho_{1},\ldots,\rho_{\ell})\mapsto F(\rho_{1},\ldots,\rho_{\ell})$ has four continuous derivatives. Moreover, setting $\bar{a}=(\bar{\rho},\dots,\bar{\rho})\in\mathcal{P}(\Omega)^{\ell}$ , assume that $F$ satisfies the following conditions.

**T1: **

for all $a=(a_{1},\ldots,a_{\ell})\in\mathcal{P}(\Omega)^{\ell}$ , all $r\in[\ell]$ and all $c_{1},c_{2}\in\Omega$ we have

[TABLE]

**T2: **

there is $C_{0}\in\mathbb{R}$ such that the gradient of $F$ at $\bar{a}$ satisfies $DF(\bar{a})=C_{0}\boldsymbol{1}$ .

Further, suppose that $\pi\in\mathcal{P}^{2}_{*}(\Omega)$ , let ${\boldsymbol{\rho}},{\boldsymbol{\rho}}_{1},{\boldsymbol{\rho}}_{2},\ldots$ be mutually independent samples from $\pi$ and define

[TABLE]

Then

[TABLE]

Equipped with Lemma 5.3 we will derive the following asymptotic formula in Section 5.4.

Lemma 5.4.

We have $\mathcal{B}(d,P,\pi_{\varepsilon})=\mathcal{B}(d,P,\pi_{0})+\frac{d(k-1)}{12}\left((k-1)d\hat{\lambda}^{2}-\hat{\lambda}\right)\varepsilon^{4}+O(\varepsilon^{5})$ as $\varepsilon\to 0$ .

Finally, Proposition 3.7 is immediate from Lemma 5.4.

Proof of Proposition 3.7.

Due to SYM it is straightforward to verify that $\mathcal{B}(d,P,\pi_{0})=\ln q+\frac{d}{k}\ln\xi.$ Hence, if $0<d<d_{\mathrm{cond}}$ , then $\mathcal{B}(d,P,\pi_{\varepsilon})\leq\mathcal{B}(d,P,\pi_{0})$ for all small enough $\varepsilon>0$ because $\pi_{\varepsilon}\in\mathcal{P}^{2}_{*}(\Omega)$ by Lemma 5.2. Therefore, Lemma 5.4 implies that $(k-1)d\hat{\lambda}^{2}-\hat{\lambda}\leq 0$ . As this bound holds for all $d<d_{\mathrm{cond}}$ , we conclude that $(k-1)d_{\mathrm{cond}}\hat{\lambda}\leq 1$ , and thus the assertion follows from Lemma 5.1. ∎

Remark 5.5.

A local expansion of the Bethe functional around the atom $\pi=\delta_{\bar{\rho}}$ on the uniform distribution was performed independently by Guilhem Semerjian (manuscript in preparation), albeit with a different objective and without the realization that the eigenvectors of $\Xi$ can be used to construct an explicit family of perturbations, cf. (5.4).

5.2. Proof of Lemma 5.1

The canonical basis $(e_{\omega})_{\omega\in\Omega}$ gives rise to the basis $(e_{\sigma}\otimes e_{\tau})_{\sigma,\tau\in\Omega}$ of the $q^{2}$ -dimensional space $\mathbb{R}^{\Omega}\otimes\mathbb{R}^{\Omega}$ . Hence, we can identify $\mathbb{R}^{\Omega}\otimes\mathbb{R}^{\Omega}$ with the space $\mathbb{R}^{\Omega\times\Omega}$ of $q\times q$ -matrices via the linear map

[TABLE]

Since $\ker\iota=\{0\}$ , $\iota$ is an isomorphism. Moreover, if we equip the space $\mathbb{R}^{\Omega\times\Omega}$ with the Frobenius inner product $\left\langle{{\,\cdot\,},{\,\cdot\,}}\right\rangle$ , then $\left\langle{{x},{y}}\right\rangle=\left\langle{{\iota(x)},{\iota(y)}}\right\rangle$ for all $x,y\in\mathbb{R}^{\Omega}\otimes\mathbb{R}^{\Omega}$ .

By Lemma 3.6 the linear operator $\Xi$ is self-adjoint and $\Xi{\mathcal{E}}\subset{\mathcal{E}}$ . Therefore, ${\mathcal{E}}$ admits an orthogonal decomposition into eigenspaces of $\Xi$ . Suppose that $\lambda=\max\{|L|:L\in\mathrm{Eig}^{\ast}([)\Xi]\}$ and let ${\mathcal{E}}_{\lambda}\subset{\mathcal{E}}$ be the corresponding eigenspace. Moreover, consider the linear map defined by $\vartheta:{\mathcal{E}}\to{\mathcal{E}}$ , $e_{\sigma}\otimes e_{\tau}\mapsto e_{\tau}\otimes e_{\sigma}$ for $\sigma,\tau\in\Omega$ . Due to the particular form (2.6) of $\Xi$ we have $\Xi\vartheta y=\vartheta\Xi y$ for all $y\in{\mathcal{E}}$ . Consequently, $\vartheta{\mathcal{E}}_{\lambda}\subset{\mathcal{E}}_{\lambda}$ . Therefore, for any $z\in{\mathcal{E}}_{\lambda}$ we have $\frac{1}{2}(z+\vartheta(z))\in{\mathcal{E}}_{\lambda}$ . Because $\vartheta^{2}=\mathrm{id}$ , this means that there exists a unit vector $z\in{\mathcal{E}}_{\lambda}$ such that $\vartheta z=z$ . Further, $\iota(z)$ is a symmetric matrix as $\vartheta z=z$ and $\iota(z)$ satisfies $\iota(z)\boldsymbol{1}=0$ and $\iota(z)x\perp\boldsymbol{1}$ for all $x\in\mathbb{R}^{\Omega}$ because $z\in{\mathcal{E}}$ . Thus, there exists an orthonormal basis $u_{1},\ldots,u_{q-1}$ of the space $\{x\in\mathbb{R}^{\Omega}:x\perp\boldsymbol{1}\}$ and $w_{1},\ldots,w_{q-1}\in\mathbb{R}$ such that

[TABLE]

Since $\iota$ is an isomorphism, (5.6) yields the representation

[TABLE]

Further, if we define $\Sigma=\sum_{i=1}^{q-1}|w_{i}|u_{i}\otimes u_{i}$ , then $\Sigma\in{\mathcal{E}}$ because $u_{i}\perp\boldsymbol{1}$ for all $i$ . Moreover, because $z$ is a unit vector and $u_{1},\ldots,u_{q-1}$ are orthonormal,

[TABLE]

Finally, once more due to the particular form (2.6) of $\Xi$ , (5.6) yields

[TABLE]

Combining (5.8) and (5.9), we thus see that $\Sigma$ is a unit vector with $\left\langle{{\Xi\Sigma},{\Sigma}}\right\rangle=\lambda=\max\{|\left\langle{{\Xi y},{y}}\right\rangle|:y\in{\mathcal{E}},\|y\|=1\}$ , as desired.

5.3. Proof of Lemma 5.3

We recall the following well-known generalization of the chain rule.

Fact 5.6 (Faà di Bruno’s formula).

Suppose that $F:(\mathbb{R}^{\Omega})^{j}\to\infty$ has $j\geq 1$ continuous derivatives. Let $\Pi(j)$ be the set of all partitions of $[j]$ , denote by $|\Upsilon|$ the cardinality of a partition $\Upsilon\in\Pi(j)$ and similarly let $|B|$ denote the cardinality of a set $B\in\Upsilon$ in the partition $\Upsilon$ . Then

[TABLE]

For $r\in[\ell]^{j}$ and $c\in\Omega^{j}$ let

[TABLE]

Because ${\boldsymbol{\rho}}_{1},\ldots,{\boldsymbol{\rho}}_{\ell}$ are mutually independent with mean $\bar{\rho}$ , we have $\mathcal{J}_{r,c}=0$ unless for each $i\in[j]$ there is $h\in[j]\setminus\{i\}$ such that $r_{i}=r_{h}$ . Hence, setting $R_{j}=\{r\in[\ell]^{j}\;:\;\forall i\in[j]\exists h\in[j]\setminus\{i\}:r_{i}=r_{h}\}$ , we see that

[TABLE]

In particular, (5.11) implies

[TABLE]

Proceeding to $j=2$ , we apply Fact 5.6 to obtain

[TABLE]

Since $R_{2}=\{(r,r):r\in[\ell]\}$ , T1 and (5.13) entail that

[TABLE]

Moving on to $\mathcal{J}_{3}$ , we observe that $R_{3}=\{(r,r,r):r\in[\ell]\}$ . Moreover, Fact 5.6 yields

[TABLE]

Hence, T2 yields

[TABLE]

Finally, we come to $\mathcal{J}_{4}$ . Fact 5.6 yields

[TABLE]

Since $R_{4}=\{(r_{1},r_{2},r_{3},r_{4})\in[\ell]^{4}:|\{r_{1},r_{2},r_{3},r_{4}\}|\leq 2\}$ , T1 implies that

[TABLE]

Moreover, similarly as before T2 implies

[TABLE]

Analogously, once more by T2

[TABLE]

Thus, combining (5.16)–(5.19), we obtain

[TABLE]

Since $\mathbb{E}[J({\boldsymbol{\rho}}_{1},\ldots,{\boldsymbol{\rho}}_{\ell})]=\sum_{j=1}^{4}\frac{1}{j!}\mathcal{J}_{j}$ and $\Lambda^{\prime\prime}(x)=1/x$ , the assertion follows from (5.12), (5.14), (5.15) and (5.20).

5.4. Proof of Lemma 5.4

Recall that $\hat{\lambda}=\max_{\lambda\in\mathrm{Eig}^{\ast}([)\Xi]}|\lambda|$ , that $\Sigma\in{\mathcal{E}}$ is an eigenvector of $\Xi$ with eigenvalue $\hat{\lambda}$ , and that $\eta$ is the vector defined by (5.3). We tacitly assume that $\varepsilon$ is small enough so that $\pi_{\varepsilon}\in\mathcal{P}_{2}^{*}(\Omega)$ (cf. Lemma 5.2) and we denote by ${\boldsymbol{\rho}},{\boldsymbol{\rho}}_{1},{\boldsymbol{\rho}}_{2},\ldots$ independent samples from $\pi_{\varepsilon}$ . Hence, for any function $X:(\mathbb{R}^{\Omega})^{\ell}\to\mathbb{R}$ the expectation $\mathbb{E}[X({\boldsymbol{\rho}}_{1},\ldots,{\boldsymbol{\rho}}_{\ell})]$ can be viewed as a function of $\varepsilon$ . Further, since $\pi_{\varepsilon}$ is the uniform distribution on the distributions $\pi_{\varepsilon,\omega}$ from (5.4), which are atoms, the function $\varepsilon\mapsto\mathbb{E}[X({\boldsymbol{\rho}}_{1},\ldots,{\boldsymbol{\rho}}_{\ell})]$ has the same continuity as $X$ .

Ultimately we are going to expand the function $\varepsilon\mapsto\mathcal{B}(d,P,\pi_{\varepsilon})$ to the fourth order. But first we need a few preparations. First we observe that $\Sigma$ encodes the covariance matrix of the random vector $({\boldsymbol{\rho}}(\omega))_{\omega\in\Omega}$ .

Claim 5.7.

We have $\mathbb{E}[\mathbold{\rho}(c_{1})-q^{-1}]=0$ and $\mathbb{E}[(\mathbold{\rho}(c_{1})-q^{-1})(\mathbold{\rho}(c_{2})-q^{-1})]=q^{-1}\varepsilon^{2}\left\langle{{\Sigma},{e_{c_{1}}\otimes e_{c_{2}}}}\right\rangle$ for all $c_{1},c_{2}\in\Omega$ .

Proof.

The first assertion follows from Lemma 5.2, which shows that $\pi_{\varepsilon}\in\mathcal{P}^{2}_{*}(\Omega)$ . Moreover, because the vectors $u_{1},\ldots,u_{q-1}\in{\mathcal{E}}$ from (5.3) are orthonormal, (5.1) and (5.4) yield

[TABLE]

as claimed. ∎

Additionally, we need the following algebraic relation.

Claim 5.8.

For any $\psi\in\Psi$ we have $\langle\left(\Phi_{\psi}\otimes\Phi_{\psi}\right)\Sigma,\Sigma\rangle=\sum_{c\in\Omega^{4}}\Phi_{\psi}(c_{1},c_{3})\Phi_{\psi}(c_{2},c_{4})\left\langle{{\Sigma},{e_{c_{1}}\otimes e_{c_{2}}}}\right\rangle\left\langle{{\Sigma},{e_{c_{3}}\otimes e_{c_{4}}}}\right\rangle$ .

Proof.

Since $\Sigma=\sum_{i\in\Omega}\bar{\lambda}_{i}u_{i}\otimes u_{i}$ we have

[TABLE]

as claimed. ∎

We proceed to expand $\varepsilon\mapsto\mathcal{B}(d,P,\pi_{\varepsilon})$ . For $\psi,\psi_{1},\ldots,\psi_{\gamma}\in\Psi$ let

[TABLE]

Then with $\boldsymbol{\psi},\boldsymbol{\psi}_{1},\boldsymbol{\psi}_{2},\ldots$ chosen independently from $P$ ,

[TABLE]

and we shall derive the approximations to both summands separately, using Lemma 5.3 in either case.

Claim 5.9.

We have

[TABLE]

Proof.

Fixing $\gamma$ and $\psi_{1},\ldots,\psi_{\gamma}$ for the moment, we consider the function

[TABLE]

Then with $J_{\psi_{1},\ldots,\psi_{\gamma}}$ denoting the fourth Taylor polynomial of $\Lambda\circ F_{\psi_{1},\ldots,\psi_{\gamma}}$ as in equation (5.5), we can write $\Lambda\circ F_{\psi_{1},\ldots,\psi_{\gamma}}=J_{\psi_{1},\ldots,\psi_{\gamma}}+R_{\psi_{1},\ldots,\psi_{\gamma}}$ . We are going to show that, with $\boldsymbol{\psi}_{1},\ldots,\boldsymbol{\psi}_{\gamma}$ chosen from $P$ and $({\boldsymbol{\rho}}_{i,j})_{i,j}$ chosen from $\pi_{\varepsilon}$ , all mutually independent,

[TABLE]

whence (5.22) is immediate because the Poisson distribution has sub-exponential tails.

To prove (5.23) we apply Lemma 5.3. Thus, we need the first and second partial derivatives of $F_{\psi_{1},\ldots,\psi_{\gamma}}$ . To work out the first partial derivatives, let $s\in[\gamma]$ , $r\in[k-1]$ and $c_{1}\in\Omega$ . Then

[TABLE]

In particular, SYM yields $\frac{\partial F_{\psi_{1},\ldots,\psi_{\gamma}}}{\partial\rho_{s,r}(c_{1})}(\bar{\rho},\ldots,\bar{\rho})=q\xi^{\gamma}$ , and thus the assumptions T1–T2 of Lemma 5.3 are satisfied. With respect to the second derivatives, there are two cases. First, fix $s\in[\gamma]$ , distinct $r_{1},r_{2}\in[k-1]$ and $c_{1},c_{3}\in\Omega$ . Let $\theta_{1}:[k]\mapsto[k]$ be the permutation such that $\theta_{1}(r_{1})=1,\;\theta_{1}(r_{2})=2$ and $\theta(i)=i$ for all $i\neq r_{1},r_{2}$ . Using SYM, we obtain

[TABLE]

Second, fix distinct $s,s^{\prime}\in[\gamma]$ and any $r_{1},r_{2}\in[k-1]$ , $c_{1},c_{3}\in\Omega$ . Let $\theta_{2},\theta_{3}$ be the permutations such that $\theta_{2}(k)=2,\;\theta_{2}(r_{1})=1$ and $\theta_{2}(i)=i$ for all $i\neq r_{1},k$ and $\theta_{3}(k)=1,\theta_{3}(r_{2})=2$ and $\theta_{3}(i)=i$ for all $i\neq r_{2},k$ . Then SYM yields

[TABLE]

Hence, Lemma 5.3 gives

[TABLE]

Further, Claim 5.8 yields

[TABLE]

Therefore, since SYM provides that the distribution $P$ is invariant under permutations,

[TABLE]

which completes the proof of (5.23).

Moving on to (5.24), we write the remainder $R_{\psi_{1},\ldots,\psi_{\gamma}}$ for $\rho_{i,j}$ in the support of $\pi_{\varepsilon}$ as

[TABLE]

where $\tilde{\rho}=(\tilde{\rho}_{i,j})_{i,j}$ is a point on the line segment between the points $(\bar{\rho},\ldots,\bar{\rho})$ and $(\rho_{i,j})_{i,j}$ . In particular, $\prod_{i=1}^{5}(\tilde{\rho}_{h_{i}}(c_{i})-q^{-1})=O(\varepsilon^{5}).$ Hence, Fact 5.6 shows that

[TABLE]

where $\tilde{\rho}$ ranges over the convex hull of the support of $\pi_{\varepsilon}$ . Because all weight functions take values in the interval $(0,2)$ , we find $\prod_{B\in\Upsilon}(\partial^{|B|}F_{\psi_{1},\ldots,\psi_{\gamma}}(\tilde{\rho})/{\prod_{i\in B}\partial x_{i}})=\exp(O(\gamma))$ . In addition,

[TABLE]

Thus, (2.1) shows that $R_{\psi_{1},\ldots,\psi_{\gamma}}(\rho_{i,j})=O(\varepsilon^{5})\exp(O(\gamma))$ , which is (5.24). ∎

Claim 5.10.

We have $\mathbb{E}\left[{B_{2}(\boldsymbol{\psi})}\right]=\Lambda(\xi)+\frac{\varepsilon^{4}\xi k(k-1)}{12}\langle\Xi\Sigma,\Sigma\rangle+O(\varepsilon^{5}).$

Proof.

To investigate $B_{2}(\psi)$ we apply Lemma 5.3 to $F_{\psi}(\rho_{1},\dots,\rho_{k})=\sum_{\tau\in\Omega^{k}}\psi(\tau)\prod_{i=1}^{k}\rho_{i}(\tau_{i})$ . The derivatives are

[TABLE]

where $\theta:[k]\to[k]$ is such that $\theta(r_{1})=1,\;\theta(r_{2})=2$ and $\theta(r)=r$ for all $r\neq r_{1},r_{2}$ . Thus, SYM yields

[TABLE]

Once more we write $\Lambda\circ F_{\psi}=J_{\psi}+R_{\psi}$ , where $J_{\psi}$ is the fourth Taylor polynomial as in (5.5). Applying Lemma 5.3, we obtain

[TABLE]

Further, Claim 5.7 yields $(\mathbb{E}[{\boldsymbol{\rho}}(c_{1}){\boldsymbol{\rho}}(c_{2})]-q^{-2})(\mathbb{E}[{\boldsymbol{\rho}}(c_{3}){\boldsymbol{\rho}}(c_{4})]-q^{-2})=\varepsilon^{4}q^{-2}\left\langle{{\Sigma},{e_{c_{1}}\otimes e_{c_{2}}}}\right\rangle\left\langle{{\Sigma},{e_{c_{3}}\otimes e_{c_{4}}}}\right\rangle$ , whence by Claim 5.8,

[TABLE]

Furthermore, by Fact 5.6 for any $\rho_{1},\ldots,\rho_{k}$ in the support of $\pi_{\varepsilon}$ exist $\tilde{\rho}$ on the line segment between $(\bar{\rho},\ldots,\bar{\rho})$ and $(\rho_{1},\ldots,\rho_{k})$ such that

[TABLE]

Hence, (2.1) guarantees that $\mathbb{E}[R_{\boldsymbol{\psi}}(\rho_{1},\ldots,\rho_{k})]=O(\varepsilon^{5})$ and thus the assertion follows from (5.26). ∎

Proof of Proposition 5.4.

Combining (5.21) with Claims 5.9 and 5.10, we obtain

[TABLE]

Since $\left\langle{{\Xi\Sigma},{\Sigma}}\right\rangle=\hat{\lambda}$ , $\left\langle{{\Xi^{2}\Sigma},{\Sigma}}\right\rangle=\hat{\lambda}^{2}$ and $\mathcal{B}(d,P,\bar{\pi})=\ln q+\frac{d}{k}\ln\xi$ , the assertion follows from (5.27). ∎

6. Overlap concentration in the teacher-student model

Throughout this section we assume that $P$ satisfies conditions BAL, SYM, MIN and POS.

6.1. Outline

In this section we prove Proposition 3.3. We will exhibit a connection between the overlap and the derivative $\frac{\partial}{\partial d}\mathbb{E}[\ln Z(\hat{\boldsymbol{G}})]$ of the free energy: if $\mathbb{E}\langle\|{\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}-\bar{\rho}}\|_{\mathrm{TV}}\rangle_{\hat{\boldsymbol{G}}}$ is bounded away from [math] for some $d<d_{\mathrm{cond}}$ , then the derivative of the free energy is so large that the formula $n^{-1}\mathbb{E}[\ln Z(\hat{\boldsymbol{G}})]=\ln q+\frac{d}{k}\ln\xi+o(1)$ cannot possibly hold, in contradiction to Theorem 3.2. We begin with the following “continuity statement”, which is a generalization of [23, Lemma 4.6] for the Potts model: if the overlap deviates from $\bar{\rho}$ for some average degree $d$ , then the same holds for at least a small interval of average degrees.

Lemma 6.1.

For any $\varepsilon>0$ , $d>0$ there is $0<\delta=\delta(\varepsilon,d,P)<\varepsilon$ such that the following holds. Assume that $m\in\mathcal{M}(d)$ is a sequence such that

[TABLE]

Then

[TABLE]

The proof of Lemma 6.1 can be found in Section 6.2. Further, in Section 6.3 we derive the following asymptotic formula for the derivative of the free energy.

Lemma 6.2.

Uniformly for all $d\leq d_{\mathrm{cond}}+1$ we have

[TABLE]

with $\boldsymbol{\psi}$ chosen from $P$ independently of $\hat{\boldsymbol{G}}$ and $\boldsymbol{i}_{1},\ldots,\boldsymbol{i}_{k}\in[n]$ chosen uniformly and independently.

Corollary 6.3.

Uniformly for all $d<d_{\mathrm{cond}}+1$ we have

[TABLE]

Moreover, for any $\varepsilon>0$ there is $\delta=\delta(\varepsilon,P)>0$ , independent of $n$ or $d$ , such that uniformly for all $d<d_{\mathrm{cond}}+1$ ,

[TABLE]

For the special case of the Potts model a result like Corollary 6.3 was known [23, Lemma 4.10]. The proof was relatively straightforward because in the special case it is possible to write a fairly explicit formula for the expression $\Lambda\left({\left\langle{\boldsymbol{\psi}(\boldsymbol{\sigma}(x_{\boldsymbol{i}_{1}}),\ldots,\boldsymbol{\sigma}(x_{\boldsymbol{i}_{k}}))}\right\rangle_{\hat{\boldsymbol{G}}}}\right)$ . Remarkably, the following proof shows that we can do without an explicit formula thanks to a mildly tricky application of Jensen’s inequality in combination with condition MIN.

Proof of Corollary 6.3.

Since $\Lambda$ is convex, Jensen’s inequality gives

[TABLE]

Hence, using the Nishimori identity (4.3) and Corollary 4.7, we obtain

[TABLE]

Combining (6.2), (6.5) and (6.6) with Lemma 6.2 gives (6.3).

To prove the second assertion we expand $\Lambda(x)$ to the second order around $\xi$ to obtain

[TABLE]

Since $\Lambda^{\prime\prime}(x)\geq 1/2$ for all $x\in(0,2)$ , (6.7) and (6.6) yield

[TABLE]

Further, with $\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}$ denoting two independent samples from the Gibbs measure of $\hat{\boldsymbol{G}}$ we obtain

[TABLE]

Since $\boldsymbol{i}_{1},\ldots,\boldsymbol{i}_{k}$ are chosen uniformly and independently of each other and of $\hat{\boldsymbol{G}}$ and $\boldsymbol{\psi}$ , we can cast (6.9) in terms of the overlap $\rho_{\boldsymbol{\sigma}_{1},\boldsymbol{\sigma}_{2}}$ as

[TABLE]

Further, Corollary 4.7 and the Nishimori identity (4.3) yield $\mathbb{E}\left\langle{\left\|{\rho_{\boldsymbol{\sigma}_{1}}-\bar{\rho}}\right\|_{\mathrm{TV}}+\left\|{\rho_{\boldsymbol{\sigma}_{2}}-\bar{\rho}}\right\|_{\mathrm{TV}}}\right\rangle_{\hat{\boldsymbol{G}}}=o(1)$ , whence

[TABLE]

Moreover, the function $\rho\in\mathcal{P}(\Omega^{2})\mapsto\sum_{\sigma,\tau\in\Omega^{k}}\mathbb{E}[\boldsymbol{\psi}(\sigma)\boldsymbol{\psi}(\tau)]\prod_{i\in[k]}\rho(\sigma_{i},\tau_{i})$ is uniformly continuous. Therefore, if $\mathbb{E}\left\langle{\left\|{\rho_{\boldsymbol{\sigma},\boldsymbol{\tau}}-\bar{\rho}}\right\|_{\mathrm{TV}}}\right\rangle>\varepsilon$ , then Fact 4.3, (6.11) and conditions MIN and SYM yield $\delta=\delta(\varepsilon)>0$ such that

[TABLE]

Finally, (6.2), (6.8), (6.9), (6.10) and (6.12) yield (6.4). ∎

Corollary 6.4.

For all $d>0$ we have $\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\ln Z(\hat{\boldsymbol{G}})]\geq\ln q+\frac{d}{k}\ln\xi$ .

Proof.

This follows from (6.3) by integrating. ∎

Finally, to prove Proposition 3.3 we combine Lemma 6.1 and Corollary 6.3 to argue that if $\mathbb{E}\left\langle{\left\|{\rho_{\boldsymbol{\sigma},\boldsymbol{\tau}}-\bar{\rho}}\right\|_{\mathrm{TV}}}\right\rangle_{\hat{\boldsymbol{G}}}$ is bounded away from [math] for some $d<d_{\mathrm{cond}}$ , then in fact for all $d$ in a small interval the derivative $\frac{1}{n}\frac{\partial}{\partial d}\mathbb{E}[\ln Z(\hat{\boldsymbol{G}})]$ strictly exceeds $k^{-1}\ln\xi$ . Consequently, $n^{-1}\mathbb{E}[\ln Z(\hat{\boldsymbol{G}})]$ is strictly greater than $\ln q+\frac{d}{k}\ln\xi$ for some $d<d_{\mathrm{cond}}$ , in contradiction to Theorem 3.2.

Proof of Proposition 3.3.

Assume that there exist $D_{0}<d_{\mathrm{cond}}$ and $\varepsilon>0$ such that

[TABLE]

Then Lemma 6.1 shows that there is $\delta>0$ such that with $D_{1}=D_{0}+3\delta/2<d_{\mathrm{cond}}$ for infinitely many $n$ we have

[TABLE]

But then Corollaries 6.3 and 6.4 imply that for infinitely many $n$ ,

[TABLE]

Consequently,

[TABLE]

Therefore, Theorem 3.2 yields $\sup_{\pi\in\mathcal{P}^{2}_{\ast}(\Omega)}\mathcal{B}(D_{1},P,\pi)>\ln q+\frac{D_{1}}{k}\ln\xi,$ in contradiction to $D_{1}<d_{\mathrm{cond}}$ . ∎

6.2. Proof of Lemma 6.1

The proof, which is a non-trivial generalization of the argument for [23, Lemma 4.6] for the Potts model, is based on a coupling of the random factor graphs $\hat{\boldsymbol{G}}(n,m)$ and $\hat{\boldsymbol{G}}(n,m^{\prime})$ with different numbers $m,m^{\prime}$ of constraint nodes; to set up the coupling we use the Nishimori identity (4.3). Thus, as a first step we need a coupling of $\hat{\boldsymbol{\sigma}}_{n,m}$ and $\hat{\boldsymbol{\sigma}}_{n,m^{\prime}}$ .

Lemma 6.5.

For any $\eta>0$ , $d>0$ there is $\delta>0$ such that

[TABLE]

Proof.

Given $\eta>0$ pick a sufficiently small $\beta=\beta(\eta)>0$ . Let $\phi$ be the function from (4.4). Because the constraint nodes of $\boldsymbol{G}$ are chosen independently, for all $m\geq 0$ , $\sigma\in\Omega^{V_{n}}$ we have

[TABLE]

Furthermore, by Corollary 4.7 there exists $C>0$ such that

[TABLE]

which implies that

[TABLE]

Applying Lemma 4.5 to expand (6.14) to the second order, we obtain $C^{\prime}>0$ such that for all $m$ and all $\sigma$ satisfying $\left\|{\rho_{\sigma}-\bar{\rho}}\right\|_{\mathrm{TV}}\leq C/\sqrt{n}$ ,

[TABLE]

Hence, choosing $\delta=\delta(\beta,C,d)>0$ small enough, we can ensure that for all $m,m^{\prime}$ such that $|m-dn/k|+|m^{\prime}-dn/k|\leq\delta$ and all $\sigma$ satisfying $\left\|{\rho_{\sigma}-\bar{\rho}}\right\|_{\mathrm{TV}}\leq C/\sqrt{n}$ the estimate

[TABLE]

holds. Further, combining (6.16) and (6.17), we obtain that

[TABLE]

provided that $|m-dn/k|+|m^{\prime}-dn/k|\leq\delta$ and $\beta=\beta(\eta)$ was chosen small enough. Moreover, combining (6.17) and (6.18), we conclude that if $|m-dn/k|+|m^{\prime}-dn/k|\leq\delta$ and $\left\|{\rho_{\sigma}-\bar{\rho}}\right\|_{\mathrm{TV}}\leq C/\sqrt{n}$ , then

[TABLE]

Finally, the assertion follows from (6.15) and (6.19). ∎

Proof of Lemma 6.1.

Assume that $m\in\mathcal{M}(d)$ satisfies (6.1). Pick $\eta=\eta(\varepsilon)>0$ small enough, let $\delta=\delta(\eta)>0$ be the number promised by Lemma 6.5 and assume that $n$ is a large enough number such that $|m-dn/k|<\delta n/2$ and

[TABLE]

Further, suppose that $m^{\prime}>m$ is such that $|m^{\prime}-dn/k|<\delta n/2$ . Then by Lemma 6.5 we can couple $\hat{\boldsymbol{\sigma}}_{n,m}$ and $\hat{\boldsymbol{\sigma}}_{n,m^{\prime}}$ such that the event $\mathcal{A}=\{\hat{\boldsymbol{\sigma}}_{n,m}=\hat{\boldsymbol{\sigma}}_{n,m^{\prime}}\}$ satisfies

[TABLE]

We extend this to a coupling of a pair of factor graphs $\boldsymbol{G}^{\prime},\boldsymbol{G}^{\prime\prime}$ such that $\boldsymbol{G}^{\prime}$ is distributed as $\boldsymbol{G}^{\ast}(n,m^{\prime},\hat{\boldsymbol{\sigma}}_{n,m^{\prime}})$ and $\boldsymbol{G}^{\prime\prime}$ is distributed as $\boldsymbol{G}^{\ast}(n,m,\hat{\boldsymbol{\sigma}}_{n,m})$ as follows. First choose $\boldsymbol{G}^{\prime}$ from the distribution $\boldsymbol{G}^{\ast}(n,m^{\prime},\hat{\boldsymbol{\sigma}}_{n,m^{\prime}})$ . Then obtain $\boldsymbol{G}^{\prime\prime\prime}$ from $\boldsymbol{G}^{\prime}$ by deleting a uniformly chosen set of $m^{\prime}-m$ constraint nodes. On the event $\mathcal{A}$ set $\boldsymbol{G}^{\prime\prime}=\boldsymbol{G}^{\prime\prime\prime}$ . If $\mathcal{A}$ does not occur, then choose the constraint nodes of $\boldsymbol{G}^{\prime\prime}$ independently of those of $\boldsymbol{G}^{\prime}$ in such a way that $\boldsymbol{G}^{\prime\prime}$ is distributed as $\boldsymbol{G}^{\ast}(n,m,\hat{\boldsymbol{\sigma}}_{n,m})$ .

Now, (6.20) implies that with probability at least $\varepsilon/2$ the random graph $\boldsymbol{G}^{\prime\prime}$ is such that a random sample $\boldsymbol{\tau}$ from $\mu_{\boldsymbol{G}^{\prime\prime}}$ satisfies $\langle{\|{\rho_{\boldsymbol{\sigma},\boldsymbol{\tau}}-\bar{\rho}}\|_{\mathrm{TV}}}\rangle_{\boldsymbol{G}^{\prime\prime}}\geq\varepsilon$ . By Corollary 4.7 and the Nishimori identity (4.3), with probability $1-o(1)$ this random sample $\boldsymbol{\tau}$ is nearly balanced. Consequently, there exists a map $G\mapsto\tau_{G}$ that provides a nearly balanced $\tau_{G}$ for every factor graph $G$ such that $\mathbb{P}[\langle{\|{\rho_{\boldsymbol{\sigma},\tau_{\boldsymbol{G}^{\prime\prime}}}-\bar{\rho}}\|_{\mathrm{TV}}\rangle}_{\boldsymbol{G}^{\prime\prime}}>\varepsilon]\geq\varepsilon/2$ . Thus, $\mathbb{E}\langle{\|{\rho_{\boldsymbol{\sigma},\tau_{\boldsymbol{G}^{\prime\prime}}}-\bar{\rho}}\|_{\mathrm{TV}}\rangle}_{\boldsymbol{G}^{\prime\prime}}>\varepsilon^{2}/2.$ Hence, assuming that $\eta$ was chosen small enough, we obtain from (6.13) and the Nishimori identity (4.3) that

[TABLE]

Finally, on the event $\mathcal{A}$ the factor graph $\boldsymbol{G}^{\prime\prime}=\boldsymbol{G}^{\prime\prime\prime}$ is obtained from $\boldsymbol{G}^{\prime}$ by deleting a few random constraint nodes. Thus, for a graph $\boldsymbol{G}^{\prime}$ let $\boldsymbol{\tau}_{\boldsymbol{G}^{\prime}}$ be a random assignment with distribution $\tau_{\boldsymbol{G}^{\prime\prime\prime}}$ . Then (6.22) implies

[TABLE]

Hence, by the Nishimori identity (4.3) and (6.21),

[TABLE]

Since by construction $\boldsymbol{\tau}_{\boldsymbol{G}^{\prime}}$ is nearly balanced, the assertion follows from (6.23) and Lemma 4.2. ∎

6.3. Proof of Lemma 6.2

We shall see shortly that calculating the derivative $\frac{\partial}{\partial d}\mathbb{E}[\ln Z(\hat{\boldsymbol{G}})]$ basically comes down to calculating the difference $\mathbb{E}[\ln Z(\hat{\boldsymbol{G}}(n,\boldsymbol{m}+1))]-\mathbb{E}[\ln Z(\hat{\boldsymbol{G}}(n,\boldsymbol{m}))]$ . We are going to perform this calculation by way of a very accurate coupling of $\hat{\boldsymbol{G}}(n,\boldsymbol{m}+1)$ and $\hat{\boldsymbol{G}}(n,\boldsymbol{m})$ . A similar argument was used in [23] for the case that the set $\Psi$ of weight functions is finite. Once more the coupling is based on the Nishimori identity (4.3). Thus, we begin with a coupling of the random assignments $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}$ and $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}$ . The following is a generalization of [23, Corollary 3.29].

Lemma 6.6.

There exists a coupling of $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}$ and $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}$ such that the following holds uniformly for all $d\leq d_{\mathrm{cond}}+1$ .

(i)

With probability $1-O(n^{-1}\ln^{2}n)$ we have $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}=\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}$ . 2. (ii)

With probability $1-O(1/n^{2})$ the set $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}\triangle\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}=\{x\in V_{n}:\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}(n)\neq\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}(x)\}$ has size at most $n^{2/3}$ .

Proof.

By definition, for any $\sigma\in\Omega^{V_{n}}$

[TABLE]

Further, due to the independence of the constraint nodes, we obtain

[TABLE]

Let $\phi$ be the function from (4.4). Then Lemma 3.5 and Lemma 4.5 show that for $\rho\in\mathcal{P}(\Omega)$ ,

[TABLE]

Hence, expanding the r.h.s. of (6.25) to the second order, we obtain

[TABLE]

Moreover, let $\mathcal{N}$ be the set of all $\rho\in\mathcal{P}(\Omega)$ such that $n\rho(\omega)$ is an integer for every $\omega\in\Omega$ . Then

[TABLE]

Further, let $\mathcal{N}^{\prime}=\left\{{\rho\in\mathcal{N}:\left\|{\rho-\bar{\rho}}\right\|_{\mathrm{TV}}\leq n^{-1/2}\ln n}\right\}$ . Then (6.28), Stirling’s formula and Lemmas 3.5 and 4.5 yield

[TABLE]

Of course, the corresponding formula holds for $\mathbb{E}[Z(\boldsymbol{G}(n,m+1))]$ . Hence, (6.25) and (6.26) yield

[TABLE]

Combining (6.24), (6.25), (6.27) and (6.29), we conclude that

[TABLE]

By Corollary 4.7 $\|{\rho_{\boldsymbol{\sigma}_{n,\boldsymbol{m}}}-\bar{\rho}}\|_{\mathrm{TV}}$ is bounded by $O(n^{-1/2}\ln n)$ with probability at least $1-O(1/n)$ . Hence, (6.30) shows that $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}},\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}$ have total variation distance $O(n^{-1}\ln^{2}n)$ , which yields the first assertion follows.

With respect to the second, we obtain from Corollary 4.7 that

[TABLE]

Hence, if we choose the empirical distributions $\rho_{\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}},\rho_{\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}}$ independently, then $\left\|{\rho_{\boldsymbol{\sigma}_{n,\boldsymbol{m}}}-\rho_{\boldsymbol{\sigma}_{n,\boldsymbol{m}+1}}}\right\|_{\mathrm{TV}}\leq 2n^{-1/2}\ln n$ with probability $1-O(n^{-3})$ . Finally, we obtain the desired coupling of $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}$ , $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}$ for (ii): given $\rho$ , $\rho^{\prime}\in\mathcal{N}$ choose a collection of pairwise disjoint sets $(S_{\omega})_{\omega\in\Omega}\subset V_{n}$ with $|S_{\omega}|=n\min\{\rho(\omega),\rho^{\prime}(\omega)\}$ randomly, set $\sigma(x)=\sigma^{\prime}(x)=\omega$ for all $x\in S_{\omega}$ and let $\sigma,\sigma^{\prime}$ assign different spins to the nodes in $V_{n}\setminus\bigcup_{\omega\in\Omega}S_{\omega}$ so as to ensure that $\rho_{\sigma}=\rho$ and $\rho_{\sigma^{\prime}}=\rho^{\prime}$ . ∎

Corollary 6.7.

Uniformly for all $d\leq d_{\mathrm{cond}}+1$ the following is true. Given the random assignment $\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}$ choose a constraint node $\boldsymbol{a}$ from the distribution

[TABLE]

and choose $\boldsymbol{G}^{*}(n,\boldsymbol{m},\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}})$ independently. Then

[TABLE]

Proof.

By the Nishimori identity (4.3) we have

[TABLE]

To calculate the difference of the two terms on the r.h.s. we couple $\boldsymbol{\sigma}^{\prime}=\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}$ and $\boldsymbol{\sigma}^{\prime\prime}=\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1}$ via Lemma 6.6. Clearly, if $\boldsymbol{\sigma}^{\prime}=\boldsymbol{\sigma}^{\prime\prime}$ , then we can couple $\boldsymbol{G}^{\prime}=\boldsymbol{G}^{*}(n,\boldsymbol{m},\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1})$ and $\boldsymbol{G}^{\prime\prime}=\boldsymbol{G}^{*}(n,\boldsymbol{m}+1,\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}+1})$ such that $\boldsymbol{G}^{\prime\prime}$ is obtained from $\boldsymbol{G}^{\prime}$ by adding one additional independent constraint node $\boldsymbol{a}=a_{\boldsymbol{m}+1}$ and thus

[TABLE]

Hence, by (2.1) and the first part of Lemma 6.6,

[TABLE]

If $|\boldsymbol{\sigma}^{\prime}\triangle\boldsymbol{\sigma}^{\prime\prime}|\leq n^{2/3}$ and $\|\rho_{\boldsymbol{\sigma}^{\prime}}-\bar{\rho}\|\leq n^{-1/2}\ln n$ , then by (2.1) we have

[TABLE]

Further, let us write $\boldsymbol{a}^{\prime}$ for a factor node chosen from (2.14) with respect to $\boldsymbol{\sigma}^{\prime}$ and $\boldsymbol{a}^{\prime\prime}$ for one chosen with respect to $\boldsymbol{\sigma}^{\prime\prime}$ . Let $\mathcal{A}$ be the event that a random factor node does not have a neighbor in $\boldsymbol{\sigma}^{\prime}\triangle\boldsymbol{\sigma}^{\prime\prime}$ . Since $\|\rho_{\boldsymbol{\sigma}^{\prime}}-\bar{\rho}\|\leq n^{-1/2}\ln n$ , (2.1) and (6.36) imply that

[TABLE]

and similarly $\mathbb{P}\left[{\boldsymbol{a}^{\prime\prime}\not\in\mathcal{A}}\right]=O(n^{-1/3})$ . Moreover, given that $\boldsymbol{a}^{\prime},\boldsymbol{a}^{\prime\prime}\in\mathcal{A}$ , both factor nodes $\boldsymbol{a}^{\prime},\boldsymbol{a}^{\prime\prime}$ are identically distributed. Therefore, there is a coupling of $\boldsymbol{a}^{\prime},\boldsymbol{a}^{\prime\prime}$ such that $\boldsymbol{a}^{\prime}=\boldsymbol{a}^{\prime\prime}$ with probability $1-O(n^{-1/3})$ . Hence, $\boldsymbol{G}^{\prime},\boldsymbol{G}^{\prime\prime}$ can be coupled such that the set $\Delta$ of constraint nodes in which both factor graphs differ has expected size $O(n^{2/3})$ . Indeed, $\Delta$ is a binomial random variable because the constraint nodes are chosen independently. Thus, (2.1) implies

[TABLE]

and therefore

[TABLE]

Finally, if either $|\boldsymbol{\sigma}^{\prime}\triangle\boldsymbol{\sigma}^{\prime\prime}|>n^{2/3}$ or $\|\rho_{\boldsymbol{\sigma}^{\prime}}-\bar{\rho}\|>n^{-1/2}\ln n$ , then we couple $\boldsymbol{G}^{\prime},\boldsymbol{G}^{\prime\prime}$ by just choosing their constraint nodes independently. Then (2.1) implies

[TABLE]

Combining (6.33)–(6.38) and applying Corollary 4.7 and Lemma 6.6, we obtain

[TABLE]

as claimed. ∎

Proof of Lemma 6.2.

The proof is a generalization of the proof of [23, Lemma 3.32], which dealt with the Potts model. We begin with the well-known observation that

[TABLE]

To calculate the last term we apply Corollary 6.7. Let us write $\left\langle{\,\cdot\,}\right\rangle=\left\langle{\,\cdot\,}\right\rangle_{\boldsymbol{G}^{*}(n,\boldsymbol{m},\hat{\boldsymbol{\sigma}}_{n,m})}$ for brevity. Expanding the logarithm on the r.h.s. of (6.32), we obtain

[TABLE]

(where the expectation is over the choice of $\hat{\boldsymbol{\sigma}}_{n,m}$ , $\boldsymbol{G}^{*}(n,\boldsymbol{m},\hat{\boldsymbol{\sigma}}_{n,m})$ and $\boldsymbol{a}$ ). Due to (2.1) and Fubini’s theorem we can interchange the sum and the expectation. Hence, writing the expectation on $\boldsymbol{a}$ chosen from (6.31) out, with $\boldsymbol{\psi}$ chosen from $P$ independently of everything else, we obtain

[TABLE]

Further, because $|\hat{\boldsymbol{\sigma}}_{n,\boldsymbol{m}}^{-1}(\omega)|\sim n/q$ for all $\omega\in\Omega$ with probability at least $1-o(1)$ by Corollary 4.7, we obtain from (2.1) and SYM that

[TABLE]

To evaluate the expectation on the r.h.s. of (6.40) we harness the Nishimori identity (4.3), which implies the following: if $\mathcal{X}:(G,\sigma)\mapsto\mathcal{X}(G,\sigma)\in\mathbb{R}$ is an $L^{1}$ -function, then $\mathbb{E}[\mathcal{X}(\boldsymbol{G}^{*}(n,\boldsymbol{m},\hat{\boldsymbol{\sigma}}_{n,m}),\hat{\boldsymbol{\sigma}}_{n,m})]=\mathbb{E}[\mathcal{X}(\boldsymbol{G}^{*}(n,\boldsymbol{m},\hat{\boldsymbol{\sigma}}_{n,m}),\boldsymbol{\sigma}_{0})]$ . Applying this fact to the function $\mathcal{X}(G,\sigma)=\boldsymbol{\psi}(\sigma(x_{i_{1}}),\ldots,\sigma(x_{i_{k}}))\left\langle{1-\boldsymbol{\psi}(\boldsymbol{\sigma}(x_{i_{1}}),\ldots,\boldsymbol{\sigma}(x_{i_{k}}))}\right\rangle_{G}$ , we obtain

[TABLE]

Plugging (6.41) into (6.40) and writing $\boldsymbol{i}_{1},\ldots,\boldsymbol{i}_{k}$ for uniformly random indices chosen from $[n]$ we obtain

[TABLE]

Finally, since $\sum_{l\geq 2}\frac{1}{l(l-1)}(1-x)^{l}=1-x+\Lambda(x)$ , (6.42) yields (6.2). ∎

7. Moment calculations

In this section we prove Propositions 3.8 and 3.9. We begin with a very general calculation in Section 7.1, from which we subsequently deduce Propositions 3.8 and 3.9.

7.1. An asymptotic formula

The following result paves the way for the proofs of Propositions 3.8 and 3.9.

Proposition 7.1.

Assume that $P$ satisfies SYM and that $d>0$ is such that the eigenvalues $\lambda_{1}\geq\cdots\geq\lambda_{q}$ of $\Phi$ satisfy

[TABLE]

Furthermore, assume that $\varepsilon=\varepsilon(n)\to 0$ but $\sqrt{n}\varepsilon\to\infty$ as $n\to\infty$ and let

[TABLE]

Then uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Proof.

Let $R_{n,\varepsilon}$ be the set of all distributions $\rho\in\mathcal{P}(\Omega)$ such that $n\rho\in\mathbb{R}^{\Omega}$ is an integer vector and such that $\|\rho-\bar{\rho}\|_{2}<\varepsilon$ for all $\omega\in\Omega$ . Additionally, for each $\rho\in R_{n,\varepsilon}$ let $Z_{\rho}(\boldsymbol{G}(n,m))=Z(\boldsymbol{G}(n,m))\left\langle{\boldsymbol{1}\{\rho_{\boldsymbol{\sigma}}=\rho\}}\right\rangle_{\boldsymbol{G}(n,m)}.$ Then

[TABLE]

Remembering $\phi$ from (4.4), we claim that uniformly for all $\rho\in R_{n,\varepsilon}$ and $m\in\mathcal{M}(d)$ ,

[TABLE]

Indeed, because there are precisely ${\binom{n}{n\rho}}$ assignments $\sigma\in\Omega^{V_{n}}$ such that $\rho_{\sigma}=\rho$ and since the constraint nodes of $\boldsymbol{G}(n,m)$ are chosen independently, we have the exact expression $\mathbb{E}[Z_{\rho}(\boldsymbol{G}(n,m))]=\binom{n}{\rho n}\phi(\rho)^{m}$ and thus (7.3) follows from Stirling’s formula. Combining (7.2) and (7.3), we obtain

[TABLE]

In order to calculate the sum via the Laplace method, we compute the first two derivatives of $f$ . The first derivative works out to be

[TABLE]

Hence, using SYM we see that the gradient at the point $\bar{\rho}$ equals

[TABLE]

Proceeding to the second derivatives, we find

[TABLE]

Consequently, using SYM we find that the Hessian at $\bar{\rho}$ comes out as

[TABLE]

Additionally, the third derivatives of $f$ are uniformly bounded. Thus, combining (7.5) and (7.6) and observing that $\rho-\bar{\rho}\perp\boldsymbol{1}$ for all $\rho\in R_{n,\varepsilon}$ , we see that uniformly for all $\rho\in R_{n,\varepsilon}$ ,

[TABLE]

Since $\varepsilon=o(1)$ , plugging (7.7) into (7.2) we obtain uniformly for all $m\in\mathcal{M}_{d}$ ,

[TABLE]

Further, Lemma 3.5 shows that $\Phi$ is symmetric, there exists an orthogonal matrix $Q$ such that $\Phi=QLQ^{*}$ , where $L$ is the diagonal matrix whose entries are the eigenvalues $1=\lambda_{1}\geq\lambda_{2}\geq\cdots\geq\lambda_{q}$ of $\Phi$ . Since $\Phi$ is stochastic (once more by Lemma 3.5), the top eigenvalue is $\lambda_{1}=1$ and the corresponding eigenvector is $\boldsymbol{1}$ . Moreover, because all $\rho\in R_{n,\varepsilon}$ are probability distributions on $\Omega$ , we have $\rho-\bar{\rho}\perp\boldsymbol{1}$ for all $\rho\in R_{n,\varepsilon}$ . Therefore, the set $R_{n,\varepsilon}^{\prime}=\{Q^{*}(\rho-\bar{\rho}):\rho\in R_{n,\varepsilon}\}$ is contained in the $(q-1)$ -dimensional subspace spanned by the eigenvectors of $\Phi$ corresponding to $\lambda_{2},\ldots,\lambda_{q}$ . Hence, because $\varepsilon\sqrt{n}\to\infty$ the sum from (7.8) can be approximated by a $(q-1)$ -dimensional Gaussian integral and thus uniformly for all $m\in\mathcal{M}_{d}$ ,

[TABLE]

as claimed. ∎

Remark 7.2.

We observe that the proof of Proposition 7.1 did not use (2.1).

7.2. Proof of Proposition 3.8

In this section we assume that $P$ satisfies SYM and BAL. Then Lemma 3.5 readily shows that (7.1) holds for all $d>0$ and thus Proposition 7.1 applies. Hence, to prove Proposition 3.8 we merely need to show that $\mathbb{E}[Z_{\varepsilon}(\boldsymbol{G}(n,m))]\sim\mathbb{E}[Z(\boldsymbol{G}(n,m))]$ for a suitable $\varepsilon(n)=o(1)$ .

Lemma 7.3.

Assume that $P$ satisfies SYM and BAL, let $d>0$ and set $\varepsilon=\varepsilon(n)=n^{-1/3}$ . Then uniformly for all $m\in\mathcal{M}(d)$ we have $\mathbb{E}[Z_{\varepsilon}(\boldsymbol{G}(n,m))]\sim\mathbb{E}[Z(\boldsymbol{G}(n,m))].$

Proof.

Let $R_{n}$ be the set of all distributions $\rho\in\mathcal{P}(\Omega)$ such that $n\rho$ is an integer vector and let $R_{n,\varepsilon}$ be the set of all $\rho\in R_{n}$ such that $|\rho\left({\omega}\right)-1/q|<\varepsilon$ for all $\omega\in\Omega$ . Let $\phi:\rho\in\mathbb{R}^{\Omega}\mapsto\sum_{\tau\in\Omega^{k}}\mathbb{E}[\boldsymbol{\psi}(\tau)]\prod_{i\in[k]}\rho(\tau_{i})$ (cf. (4.4)). Then by the linearity of expectation and the independence of the constraint nodes of $\boldsymbol{G}(n,m)$ ,

[TABLE]

Hence, with $\bar{\rho}$ denoting the uniform distribution, uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Finally, Proposition 7.1 implies that $q^{n}\xi^{m}\exp(-\Omega(n^{1/3}))=o(\mathbb{E}[Z_{\varepsilon}(\boldsymbol{G}(n,m))])$ . ∎

Proposition 3.8 is immediate from Proposition 7.1 and Lemma 7.3.

7.3. Proof of Proposition 3.9

Assume that $P$ satisfies SYM and BAL and that $d<d_{\mathrm{cond}}$ . In order to calculate the second moment, we employ a known construction (e.g., [13]) of an auxiliary random factor graph model whose first moment equals the second moment of the original model. The spin set of this auxiliary model is the set $\Omega^{\otimes}=\Omega\times\Omega$ and we denote the pairs $(s,t)\in\Omega\times\Omega$ by $s\otimes t$ . Further, for functions $\varphi,\psi:\Omega^{k}\to\mathbb{R}$ we define

[TABLE]

Then the set of weight functions of the auxiliary model is $\Psi^{\otimes}=\{\psi\otimes\psi:\psi\in\Psi\}$ . Moreover, the probability distribution $P^{\otimes}$ on $\Psi^{\otimes}$ is simply the image of $P$ under the measurable map $\psi\in\Psi\mapsto\psi\otimes\psi$ . Clearly, the fact that $P$ satisfies SYM implies that so does $P^{\otimes}$ . (However, $P^{\otimes}$ does not necessarily satisfy BAL, and $P^{\otimes}$ need not satisfy the last two bounds in (2.1), but these are not needed to apply Proposition 7.1 due to Remark 7.2.)

For any $\psi\in\Psi$ the matrix $\Phi_{\psi\otimes\psi}$ as defined in (2.5) can be expressed in terms of the matrix $\Phi_{\psi}$ induced by the original weight function as $\Phi_{\psi\otimes\psi}=\Phi_{\psi}\otimes\Phi_{\psi}$ . Hence, recalling the definitions (2.6) and (2.9),

[TABLE]

Proof of Proposition 3.9.

For a factor graph $G$ let $G^{\otimes}$ be the factor graph obtained by replacing the weight function $\psi_{a}$ by $\psi_{a}\otimes\psi_{a}$ for every factor node $a$ of $G$ . Then

[TABLE]

Hence, if $\varepsilon=\varepsilon(n)=o(1)$ satisfies $\varepsilon\sqrt{n}\to\infty$ , then (7.9), Lemma 3.6, Proposition 3.7 and Proposition 7.1 yield

[TABLE]

as desired. ∎

8. Cycle census

Throughout this section we assume that $P$ satisfies SYM and BAL.

The aim is to prove Proposition 3.11. The proof of the first assertion is rather straightforward.

Lemma 8.1.

Let $d>0$ . For any $Y\in\mathcal{Y}$ we have $\mathbb{E}[C_{Y}(\boldsymbol{G}(n,m))]\sim\kappa_{Y}$ , uniformly for all $m\in\mathcal{M}(d)$ . Moreover, if $Y_{1},\ldots,Y_{l}\in\mathcal{Y}$ are pairwise disjoint and $y_{1},\ldots,y_{l}\geq 0$ , then uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Proof.

Let $m\in\mathcal{M}(d)$ be such that $m(n)$ takes the least possible value for every $n$ . Then (8.1) is immediate from Fact 3.10 and the fact that in $\boldsymbol{G}(n,m)$ the weight functions of the constraint nodes are chosen independently from $P$ . Furthermore, if $m^{\prime}\in\mathcal{M}(d)$ is another sequence, then the random graph $\boldsymbol{G}(n,m^{\prime})$ is obtained from $\boldsymbol{G}(n,m)$ by adding at most $n^{3/4}$ random edges and with probability $1-o(1)$ none of these edges closes a cycle of bounded length. Hence, we obtain the desired uniform rate of convergence for all sequences in $\mathcal{M}(d)$ . ∎

Lemma 8.2.

Let $d>0$ . For any $Y\in\mathcal{Y}$ with $\kappa_{Y}>0$ we have $\mathbb{E}[C_{Y}(\hat{\boldsymbol{G}}(n,m))]\sim\hat{\kappa}_{Y}$ , uniformly for all $m\in\mathcal{M}(d)$ . Moreover, if $Y_{1},\ldots,Y_{l}\in\mathcal{Y}$ are pairwise disjoint, $\kappa_{Y_{1}},\ldots,\kappa_{Y_{l}}>0$ and $y_{1},\ldots,y_{l}\geq 0$ , then uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

The proof is based on known arguments. We begin by calculating the expected number of dense small subgraphs of $\hat{\boldsymbol{G}}(n,m)$ .

Claim 8.3.

Let $u\geq 1$ be an integer and let $U(G)$ be the number of subsets $S\subset V_{n}\cup F_{m}$ of size $|S|=u$ that span more than $2|U|$ edges. Then $\mathbb{E}[U(\boldsymbol{G}^{*}(n,m,\sigma))]=O(1/n)$ uniformly for all $m\in\mathcal{M}(d)$ and all $\sigma\in\Omega^{V_{n}}$ .

Proof.

Fix numbers $u_{1},u_{2}$ such that $u_{1}+u_{2}=u$ and let $S_{1}\subset V_{n}$ and $S_{2}\subset F_{m}$ be sets of size $|S_{1}|=u_{1}$ , $|S_{2}|=u_{2}$ . Moreover, let $E\subset S_{2}\times[k]$ be a set of size $v>u_{1}+u_{2}$ and let $\mathcal{A}(S_{1},S_{2},E)$ be the event that for all pairs $(a,i)\in E$ we have $\partial_{i}a\in S_{1}$ . Then

[TABLE]

Furthermore, (2.1) ensures that there is a number $\alpha=\alpha(P)>0$ that does not depend on $\sigma$ such that the lower bound $\sum_{y_{1},\ldots,y_{k}\in V_{n}}\mathbb{E}[\boldsymbol{\psi}(\sigma(y_{1}),\ldots,\sigma(y_{k}))]\geq\alpha n^{k}$ holds. Therefore, (2.14) implies that for variable nodes $y_{1},\ldots,y_{k}\in S_{1}$ , any constraint node $a\in S_{2}$ and for any subset $J\subset[k]$ we have

[TABLE]

Since the constraint nodes are chosen independently, (8.3) implies that, uniformly for all $\sigma$ and all $m\in\mathcal{M}(d)$ ,

[TABLE]

Finally, given $u_{1},u_{2}$ the number of possible sets $S_{1}$ is bounded by $n^{u_{1}}$ , the number of possible $S_{2}$ does not exceed $m^{u_{2}}$ and given $v$ and $S_{2}$ the number of possible sets $E$ is bounded. Thus, since $u_{1}+u_{2}<v\leq ku_{2}$ the assertion follows from (8.2) and (8.4). ∎

Proof of Lemma 8.2.

Due to the Nishimori identity (4.3) we may prove the claim for the random factor graph model $\boldsymbol{G}^{\prime}=\boldsymbol{G}^{*}(n,m,\hat{\boldsymbol{\sigma}}_{n,m})$ . Moreover, by Corollary 4.7 we may condition on the event that $|\hat{\boldsymbol{\sigma}}^{-1}(\omega)|\sim n/q$ for all $\omega\in\Omega$ , in which case SYM yields

[TABLE]

We begin by showing that for any $Y=(E_{1},s_{1},t_{1},\ldots,E_{\ell},s_{\ell},t_{\ell})\in\mathcal{Y}_{\ell}$ uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Indeed, let $\boldsymbol{i}=(i_{1},\ldots,i_{\ell})\in[n]$ be a family of pairwise distinct indices such that $i_{1}<\min\{i_{2},\ldots,i_{\ell}\}$ (cf. CYC1) and let $\boldsymbol{j}=(j_{1},\ldots,j_{\ell})\in[m]$ be pairwise distinct indices such that $j_{1}<\min\{j_{2},\ldots,j_{\ell}\}$ if $\ell>1$ (cf. CYC2). Let ${\mathcal{C}}_{Y}(\boldsymbol{i},\boldsymbol{j})$ be the event that $x_{i_{1}},a_{j_{1}},\ldots,x_{i_{\ell}},a_{j_{\ell}}$ form a cycle with signature $Y$ . Set $i_{\ell+1}=i_{1}$ . Then by (2.14), (3.17) and (8.5) we have

[TABLE]

Summing on $\boldsymbol{i},\boldsymbol{j}$ , we get

[TABLE]

as claimed.

For integers $h_{1},\ldots,h_{l}\geq 1$ let $C_{h_{1},\ldots,h_{l}}(\boldsymbol{G}^{\prime})=\prod_{i=1}^{l}\prod_{l=1}^{h_{i}}(C_{Y_{i}}(\boldsymbol{G}^{\prime})-l+1).$ Then due to the inclusion/exclusion argument for the joint convergence to independent Poisson variables [19, Theorem 1.23], in order to complete the proof it suffices to show that for any $h_{1},\ldots,h_{l}\geq 1$ , uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Combinatorially, $C_{h_{1},\ldots,h_{l}}(\boldsymbol{G}^{\prime})$ is nothing but the total number of $(h_{1}+\ldots+h_{l})$ -tuples of cycles in $\boldsymbol{G}^{\prime}$ such that the first $h_{1}$ cycles have signature $Y_{1}$ , the next $h_{2}$ cycles have signature $Y_{2}$ , etc. Hence, if we define $C_{h_{1},\ldots,h_{l}}^{\prime}(\boldsymbol{G}^{\prime})$ as the number of such families of pairwise vertex disjoint cycles, then Claim 8.3 yields

[TABLE]

Furthermore, we claim that uniformly for all $m\in\mathcal{M}(d)$ ,

[TABLE]

Indeed, the argument that we used to prove (8.6) easily extends to a proof of (8.10); for if we fix index families $(\boldsymbol{i}_{v,w},\boldsymbol{j}_{v,w})_{v=1,\ldots,l,w=1,\ldots,h_{s}}$ that suit the signatures $Y_{1},\ldots,Y_{l}$ such that no index from $[n]$ resp. $[m]$ occurs more than once, then similar steps as above reveal that

[TABLE]

Hence, (8.10) follows by summing on all $(\boldsymbol{i}_{v,w},\boldsymbol{j}_{v,w})_{v,w}$ . Finally, (8.8) and (8.10) show the dedired convergence for a single sequence $m\in\mathcal{M}(d)$ and the uniformity of the rate of convergence follows from a similar argument as in the proof of Lemma 8.1. ∎

Proof of Proposition 3.11.

The claim (3.19) about the cycle counts is immediate from Lemmas 8.1 and 8.2. To prove the assertion about the probability of $\mathfrak{S}$ , let us first assume that $k=2$ . Then the event $\mathfrak{S}$ occurs iff $C_{1}=C_{2}=0$ and thus the assertion about $\mathbb{P}\left[{\boldsymbol{G}(n,m)\in\mathfrak{S}}\right]$ is immediate from Fact 3.10. Moreover, the assertion about $\mathbb{P}[\hat{\boldsymbol{G}}(n,m)\in\mathfrak{S}]$ follows from Lemma 8.2 applied to all signatures of the form $(s_{1},t_{1},\Psi)$ and $(s_{1},t_{1},\Psi,s_{2},t_{2},\Psi)$ . For $k>2$ we express the event $\mathfrak{S}$ as $\mathfrak{S}=\left\{{C_{1}=0\wedge\forall 1\leq i<j\leq m:\{\partial_{1}a_{i},\ldots,\partial_{k}a_{i}\}\neq\{\partial_{1}a_{i},\ldots,\partial_{k}a_{i}\}}\right\}.$ In particular, $\mathfrak{S}$ occurs only if $C_{1}=0$ and therefore, by the same token as in the case $k=2$ , the expressions stated in Proposition 3.11 are asymptotic upper bounds on $\mathbb{P}[\boldsymbol{G}(n,m)\in\mathfrak{S}],\mathbb{P}[\hat{\boldsymbol{G}}(n,m)\in\mathfrak{S}]$ . Finally, we notice that for $k>2$ the expected number of pairs $1\leq i<j\leq m$ such that $\{\partial_{1}a_{i},\ldots,\partial_{k}a_{i}\}=\{\partial_{1}a_{i},\ldots,\partial_{k}a_{i}\}$ is $O(1/n)$ . ∎

9. The limiting distribution

Throughout this section we assume that $P$ satisfies SYM and BAL.

In this section we prove Proposition 3.12. Let $\boldsymbol{\psi},\boldsymbol{\psi}_{1},\boldsymbol{\psi}_{2},\ldots$ be chosen independently from $P$ and for $\ell\geq 0$ set $\boldsymbol{Y}_{\ell}=\operatorname{tr}\prod_{j=1}^{\ell}\Phi_{\boldsymbol{\psi}_{j}}$ . The following lemma is the main step toward the proof of (3.20).

Lemma 9.1.

If $d<d_{\mathrm{cond}}$ , then $\sum_{\ell=1}^{\infty}\frac{(d(k-1))^{\ell}}{2\ell}\mathbb{E}\left[{(\boldsymbol{Y}_{\ell}-1)^{2}}\right]=-\frac{1}{2}\sum_{\lambda\in\mathrm{Eig}^{\ast}([)\Xi]}\ln\left({1-d(k-1)\lambda}\right).$

Proof.

Let $\boldsymbol{\Phi}_{\ell}=\prod_{j=1}^{\ell}\Phi_{\boldsymbol{\psi}_{j}}$ . Then

[TABLE]

Hence, remembering (2.6) and (2.9), we find $\mathbb{E}[(\boldsymbol{Y}_{\ell}-1)^{2}]=\mathbb{E}[(\operatorname{tr}\boldsymbol{\Phi}_{\ell}-1)^{2}]=\operatorname{tr}(\Xi^{\ell})-2\operatorname{tr}(\Phi^{\ell})+1.$ Furthermore, Lemmas 3.5 and 3.6 yield

[TABLE]

and thus

[TABLE]

As $d<d_{\mathrm{cond}}$ Proposition 3.7 yields $\max_{\lambda\in\mathrm{Eig}^{\ast}([)\Xi]}|\lambda|<d(k-1)$ , whence summing (9.1) on $\ell$ completes the proof. ∎

To prove (3.20) we need to get a handle on the discretization of the set $\Psi$ induced by the partition $\mathfrak{C}_{r}$ for $r\geq 1$ . Hence, we introduce $\boldsymbol{Y}_{\ell,r}=\operatorname{tr}\prod_{j=1}^{\ell}\Phi_{\boldsymbol{\psi}_{j}^{(r)}}.$

Corollary 9.2.

If $d<d_{\mathrm{cond}}$ , then $\sum_{\ell=1}^{\infty}\frac{(d(k-1))^{\ell}}{2\ell}\mathbb{E}[(\boldsymbol{Y}_{\ell,r}-1)^{2}]\leq-\frac{1}{2}\sum_{\lambda\in\mathrm{Eig}^{\ast}([)\Xi]}\ln\left({1-d(k-1)\lambda}\right)$ .

Proof.

By Jensen’s inequality $\sum_{\ell=1}^{\infty}\frac{(d(k-1))^{\ell}}{2\ell}\mathbb{E}[(\boldsymbol{Y}_{\ell,r}-1)^{2}]\leq\sum_{\ell=1}^{\infty}\frac{(d(k-1))^{\ell}}{2\ell}\mathbb{E}[(\boldsymbol{Y}_{\ell}-1)^{2}]$ and thus the assertion follows from Lemma 9.1. ∎

We are ready to prove (3.20).

Proof of Proposition 3.12, part 1..

Given $L,r$ let

[TABLE]

The construction of $\mathfrak{C}_{r}$ ensures that for every fixed $\ell$ , $\boldsymbol{Y}_{\ell,r}$ converges to $\boldsymbol{Y}_{\ell}$ almost surely as $r\to\infty$ . Hence, by Lemma 9.1, Corollary 9.2 and dominated convergence,

[TABLE]

which proves (3.20). ∎

In order to establish the convergence of $\mathcal{K}_{\ell,r}$ to $\mathcal{K}$ we use similar arguments. We begin with the following bound.

Lemma 9.3.

For every $0<d\leq d_{\mathrm{cond}}$ there exists $\beta>0$ such that $\sum_{\ell=1}^{\infty}\frac{(d(k-1))^{\ell}}{2\ell}\mathbb{E}\left|{\boldsymbol{1}\{\boldsymbol{Y}_{\ell}<\beta\}\ln\boldsymbol{Y}_{\ell}}\right|<\infty.$

Proof.

Pick $\beta>0$ sufficiently small and let $S=\sum_{\ell=1}^{\infty}(d(k-1))^{\ell}\mathbb{E}\left|{\boldsymbol{1}\{\boldsymbol{Y}_{\ell}<\beta\}\ln\boldsymbol{Y}_{\ell}}\right|/\left({2\ell}\right).$ Because by Lemma 3.5 the matrices $\Phi_{\psi}$ are stochastic, we have

[TABLE]

In fact, since the trace is invariant under cyclic permutations, we obtain

[TABLE]

Since $\boldsymbol{\psi}_{1},\ldots,\boldsymbol{\psi}_{\ell}$ are chosen independently, (2.1) and (9.2) imply that we can choose $\beta<0$ small enough so that $\mathbb{E}|\boldsymbol{1}\{\boldsymbol{Y}_{\ell}<\beta\}\ln\boldsymbol{Y}_{\ell}|\leq(d(k-1))^{-\ell}$ for all $\ell$ , in which case the sum converges. ∎

Corollary 9.4.

For every $0<d<d_{\mathrm{cond}}$ and every $\ell,r\geq 1$ we have $\mathbb{E}|\ln\boldsymbol{Y}_{\ell}|+\mathbb{E}|\ln\boldsymbol{Y}_{\ell,r}|<\infty$ .

Proof.

Because all weight functions $\psi\in\Psi$ take values in $(0,2)$ , it is obvious that $\mathbb{E}\left|{\boldsymbol{1}\{\boldsymbol{Y}_{\ell}\geq\beta\}\ln\boldsymbol{Y}_{\ell}}\right|<\infty$ for every $\beta<1$ . Moreover, similar steps as in the previous proof show $\sum_{l\geq 1}\mathbb{E}\left|{\boldsymbol{1}\{\boldsymbol{Y}_{l}<\beta\}\ln\boldsymbol{Y}_{l}}\right|<\infty$ for some small $0<\beta<1$ . Finally, since $x\in(0,\beta)\mapsto-\ln x$ is convex, the assertion about $|\ln\boldsymbol{Y}_{\ell,r}|$ follows from Jensen’s inequality. ∎

We are going to prove that $\mathcal{K},\mathcal{K}_{\ell}$ are well-defined by showing that they come out as the limit of the $\mathcal{K}_{\ell,r}$ as $\ell,r\to\infty$ . However, a priori it may not be entirely clear that the $\mathcal{K}_{\ell,r}$ are well-defined because they involve sums on random numbers $K_{l}$ of terms. Let us observe that this is not a problem actually, because Corollary 9.4 implies the following. We continue to let $(\boldsymbol{\psi}_{l,i,j})_{l,i,j}$ signify a family of independent samples from $P$ .

Corollary 9.5.

For every $l\geq 1,r\geq 1$ the following $L^{1}$ -limits exist:

[TABLE]

Lemma 9.6.

For every $0<d<d_{\mathrm{cond}}$ there exists $c=c(d,P)>0$ such that for all $r\geq 1$ , $L\geq 1$ ,

[TABLE]

Proof.

Let $\kappa_{l}=(d(k-1))^{l}/\left({2l}\right)$ , $\boldsymbol{X}_{l,i}=\operatorname{tr}\prod_{j=1}^{l}\Phi_{\boldsymbol{\psi}_{l,i,j}},\boldsymbol{X}_{l,i}^{(r)}=\operatorname{tr}\prod_{j=1}^{l}\Phi_{\boldsymbol{\psi}_{l,i,j}^{(r)}}$ . Then $\mathbb{E}[\boldsymbol{X}_{l,i}]=\operatorname{tr}(\Phi^{l})$ and for every $l\geq 1$ ,

[TABLE]

because $\mathbb{E}[\sum_{i=1}^{K_{l}}(\boldsymbol{X}_{l,i}-1)]=\kappa_{l}\mathbb{E}[\boldsymbol{Y}_{l}-1]$ and due to Cauchy-Schwarz. Further, because the $\boldsymbol{\psi}_{l,i,j}$ are i.i.d., for any given integer $h$ we find

[TABLE]

As $\mathbb{E}[K_{l}(K_{l}-1)]=\kappa_{l}^{2}$ , (9.4) implies

[TABLE]

Moving on to the second summand in (9.3), we recall that the function $x\in(0,\infty)\mapsto x-1-\ln x$ is convex and that for any (small) $\beta>0$ there exists $u>0$ such that $x-1-\ln x\leq u(x-1)^{2}$ for all $x\geq\beta$ . Hence, introducing the convex function $g:x\in(0,\infty)\mapsto\max\{x-1-\ln x,u(x-1)^{2}\}\geq 0$ , we have

[TABLE]

Lemmas 9.1 and 9.6 show that summing the right hand sides of (9.5) and (9.6) on $l$ gives a finite number. Thus, the first assertion follows from (9.3). With respect to the second bound, analogous steps yield

[TABLE]

and thus the desired bound follows from Jensen’s inequality. ∎

Proof of Proposition 3.12, part 2.

Lemma 9.6 shows that the random variables $\mathcal{K}_{\ell,r}$ are uniformly $L^{1}$ -bounded. Furthermore, the construction of $\mathfrak{C}^{r}$ guarantees that $\mathcal{K}_{\ell,r}\to\mathcal{K}_{\ell}$ almost surely for every fixed $\ell$ . Hence, $\mathcal{K}_{\ell,r}$ converges to $\mathcal{K}_{\ell}$ in the $L^{1}$ -norm and a second application of Lemma 9.6 shows that $\mathcal{K}_{\ell}$ tends to $\mathcal{K}$ in the $L^{1}$ -norm. ∎

10. The condensation threshold

Throughout this section we assume that $P$ satisfies SYM, BAL and POS.

In this section we prove Theorems 2.2 and 3.2. As a technical preparation we need a concentration inequality for the free energy of our random factor graph models.

10.1. Concentration

We begin with the following elementary observation.

Lemma 10.1.

Suppose that $P$ satisfies SYM and BAL. For a factor graph $G=(V,F,(\partial a)_{a\in F},(\psi_{a})_{a\in F})$ define

[TABLE]

Then for every $D>0$ there exists $C=C(D,P)>0$ such that uniformly for all $m\leq Dn$ , $t\geq 1$ and $\sigma\in\Omega^{V_{n}}$ we have

[TABLE]

Proof.

The bound (2.1) guarantees that $\mathbb{P}\left[{\max_{\tau}|\ln\boldsymbol{\psi}(\tau)|\geq(tn)^{3/8}}\right]\leq t^{-3}O(n^{-3}).$ As a consequence, the probability that either $\boldsymbol{G}(n,m,P)$ or $\boldsymbol{G}^{*}(n,m,P,\sigma)$ contains a constraint node $a_{i}$ such that $\max_{\tau}|\ln\psi_{a_{i}}(\tau)|\geq(tn)^{3/8}$ is bounded by $t^{-3}O(n^{-2})$ . Therefore, it suffices to prove (10.1) given $\mathcal{A}=\{\max_{\tau}|\ln\psi_{a_{i}}(\tau)|<(tn)^{3/8}\}$ . Due to (2.1) the conditional expectation $\mathbb{E}[\max_{\tau}|\ln\boldsymbol{\psi}(\tau)|\,\big{|}\,\max_{\tau}|\ln\boldsymbol{\psi}(\tau)|<(tn)^{3/8}]$ is bounded. Thus, the definition of the random factor graph models guarantees that uniformly for all $\sigma,m\leq Dn$ ,

[TABLE]

Further, because the constraint nodes are chosen independently, Azuma’s inequality implies that for any $s>1$ ,

[TABLE]

Thus, (10.1) follows from (10.4) and (10.5) applied to $s=tCn-\mathbb{E}[\mathcal{O}(\boldsymbol{G}(n,m,P))\,|\,\mathcal{A}]$ with $C>0$ chosen large enough. Finally, let either $\boldsymbol{G}^{\prime}=\boldsymbol{G}(n,m,P)$ or $\boldsymbol{G}^{\prime}=\boldsymbol{G}^{*}(n,m,P,\sigma)$ . Since $\ln Z(\boldsymbol{G}^{\prime})\leq\sqrt{m\mathcal{O}(\boldsymbol{G}^{\prime})}$ by Cauchy-Schwarz, (10.1) yields

[TABLE]

whence (10.2) and (10.3) are immediate. ∎

Lemma 10.2.

Suppose that $P$ satisfies SYM and BAL and let $D>0$ . There exists $C=C(D,P)>0$ such that for any $\varepsilon>0$ and $C^{\prime}>C$ there exists $\delta>0$ such that for all $\sigma\in\Omega^{V_{n}}$ , $m\leq Dn/k$ we have

[TABLE]

Proof.

Let either $\boldsymbol{G}^{\prime}=\boldsymbol{G}(n,m,P)$ or $\boldsymbol{G}^{\prime}=\boldsymbol{G}(n,m,P,\sigma)$ and choose $c=c(\varepsilon,C^{\prime})>0$ big enough so that the following is true: if $\mathcal{O}(\boldsymbol{G}^{\prime})\leq C^{\prime}n$ , then

[TABLE]

Let $\boldsymbol{G}^{\prime\prime}$ be the factor graph obtained from $\boldsymbol{G}^{\prime}$ by deleting all constraint nodes $a_{i}$ such that $\max_{\tau}|\ln\psi_{a_{i}}(\tau)|>c$ . Then (10.6) ensures that $|\ln Z(\boldsymbol{G}^{\prime})-\ln Z(\boldsymbol{G}^{\prime\prime})|\leq\varepsilon n/4$ . Furthermore, if $\boldsymbol{G}^{\prime\prime\prime}$ is obtained from $\boldsymbol{G}^{\prime\prime}$ by changing the neighborhood of some constraint node $a$ and/or its weight function, subject merely to the condition that the new weight function $\psi$ satisfies $\max_{\tau}|\ln\psi_{a_{i}}(\tau)|\leq c$ , then $|\ln Z(\boldsymbol{G}^{\prime\prime\prime})-\ln Z(\boldsymbol{G}^{\prime\prime})|\leq c$ . Therefore, Azuma’s inequality implies that for any $t>0$ ,

[TABLE]

Combining (10.6) and (10.7) with (10.2) and (10.3) completes the proof. ∎

10.2. Proof of Theorem 3.2

We recall from Section 3.5 that $\mathfrak{C}_{r}$ is the partition of $\Psi$ obtained by chopping $[0,2]^{\Omega^{k}}$ into sub-cubes with side lengths $2/r$ . Since $\mathfrak{C}_{r}$ is finite the distribution $P_{r}$ of $\boldsymbol{\psi}^{(r)}$ is supported on a finite set $\Psi_{r}$ of weight functions $\Omega^{k}\to(0,2)$ .

Lemma 10.3.

For any $\alpha>0$ , $D>0$ there is $r_{0}>0$ such that for all $d\leq D$ and all $r>r_{0}$ we have

[TABLE]

Proof.

Let

[TABLE]

Analogously, for a fixed $r$ let

[TABLE]

That is, we approximate $\psi_{i}$ by the average $\psi_{i}^{(r)}$ over the weight functions in the sub-cube that $\psi_{i}$ belongs to. Since $\Lambda$ is continuous on $[0,\infty)$ and therefore uniformly continuous on any compact subset of $[0,\infty)$ , $B_{r}\to B$ uniformly as $r\to\infty$ on the entire space $\Psi^{r}\times\mathcal{P}(\Omega)^{k\gamma}$ for every $\gamma$ . Since the Poisson distribution has sub-exponential tails, this implies the desired convergence for the first term on the right hand side of (2.3). A similar argument applies to the second term. ∎

Lemma 10.4.

The distribution $P_{r}$ satisfies SYM and BAL. Moreover, for any $\alpha>0$ , $d>0$ there is $r>0$ such that the following is true for all $\pi,\pi^{\prime}\in\mathcal{P}_{*}^{2}(\Omega)$ . With $\boldsymbol{\mu}_{1},\boldsymbol{\mu}_{2},\ldots$ chosen from $\pi$ , $\boldsymbol{\mu}_{1}^{\prime},\boldsymbol{\mu}_{2}^{\prime},\ldots$ chosen from $\pi^{\prime}$ and $\boldsymbol{\psi}^{\prime}\in\Psi$ chosen from $P_{r}$ , all mutually independent, we have

[TABLE]

Proof.

The fact that SYM and BAL are satisfied is immediate from the fact that $P_{r}$ is a conditional expectation of $P$ . To prove (10.8) we observe that by the uniform continuity of $\Lambda$ on compact subsets of $[0,\infty)$ , we can choose $r>0$ large enough so that for all $\psi\in\Psi$ , $\mu_{1},\mu_{1}^{\prime},\ldots,\mu_{k},\mu_{k}^{\prime}\in\mathcal{P}(\Omega)$ ,

[TABLE]

Thus, (10.8) follows from the triangle inequality and the fact that $P$ satisfies POS. ∎

Lemma 10.5.

For any $\alpha>0$ , $d>0$ there is $r_{0}>0$ such that uniformly for all $r\geq r_{0}$ we have

[TABLE]

Proof.

By Lemma 3.1 the models $\hat{\boldsymbol{G}}(n,\boldsymbol{m},P)$ and $\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\boldsymbol{\sigma}^{*})$ are mutually contiguous. Hence, Lemma 10.2 implies that $\mathbb{E}[\ln Z(\hat{\boldsymbol{G}}(n,\boldsymbol{m},P))]=\mathbb{E}[\ln Z(\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\boldsymbol{\sigma}^{*}))]+o(n)$ . Similarly, since $P_{r}$ satisfies SYM and BAL by Lemma 10.4, another application of Lemmas 3.1 and 10.2 yields $\mathbb{E}[\ln Z(\hat{\boldsymbol{G}}(n,\boldsymbol{m},P_{r}))]=\mathbb{E}[\ln Z(\boldsymbol{G}^{*}(n,\boldsymbol{m},P_{r},\boldsymbol{\sigma}^{*}))]+o(n)$ . Therefore, it suffices to prove that for any $\alpha>0$ for all sufficiently large $r$ we have

[TABLE]

In fact, since the Poisson variable $\boldsymbol{m}$ has sub-exponential tails, (4.6) shows that (10.9) would follow if we could show that

[TABLE]

To prove (10.10) pick $\beta=\beta(\alpha,d,P)>0$ small enough and then $r=r(\beta)>0$ large enough. Fix any $\sigma\in\Omega^{V_{n}}$ and $m\leq 2dn/k$ . We couple two factor graphs $\boldsymbol{G}^{\prime},\boldsymbol{G}^{\prime\prime}$ such that $\boldsymbol{G}^{\prime}$ has distribution $\boldsymbol{G}^{*}(n,m,P,\sigma)$ and $\boldsymbol{G}^{\prime\prime}$ is distributed as $\boldsymbol{G}^{*}(n,m,P_{r},\sigma)$ as follows. First choose $\boldsymbol{G}^{\prime}=\boldsymbol{G}^{*}(n,m,P,\sigma)$ . Let us write $\psi_{a_{1}},\ldots,\psi_{a_{m}}$ for the weight functions of $\boldsymbol{G}^{\prime}$ . Then let $\boldsymbol{G}^{\prime\prime}$ be the factor graph where each constraint node $a_{i}$ is adjacent to the same variable nodes as in $\boldsymbol{G}^{\prime}$ but where the corresponding weight function is $\psi_{a_{i}}^{(r)}$ . It is immediate from (2.14) that $\boldsymbol{G}^{\prime\prime}$ is distributed as $\boldsymbol{G}^{*}(n,m,P_{r},\sigma)$ .

To bound $\mathbb{E}[\ln(Z(\boldsymbol{G}^{\prime\prime})/Z(\boldsymbol{G}^{\prime})]$ we observe that

[TABLE]

Since the function $x\mapsto\ln^{2}x$ is strictly convex on $(0,2)$ for small $\beta$ and large $r$ we obtain from (2.14), the tail bound (2.1) and Jensen’s inequality that

[TABLE]

On the other hand, since the map $z\in[\mathrm{e}^{-1/\beta},2]\mapsto\ln z$ is uniformly continuous, we can choose a sufficiently large $r=r(\beta)$ such that $\max_{\tau}|\ln(\psi_{a_{1}}^{(r)}(\tau)/\psi_{a_{1}}(\tau))|<\alpha/(2d)$ whenever $\max_{\tau\in\Omega^{k}}|\ln\psi_{a_{1}}(\tau)|,\max_{\tau\in\Omega^{k}}|\ln\psi_{a_{1}}^{(r)}(\tau)|\leq 1/\beta$ . Thus, (10.10) follows from (10.11) and (10.12). ∎

Proof of Theorem 3.2.

Fix $d>0$ . Since Lemma 10.4 shows that $P_{r}$ satisfies SYM and BAL, [23, Proposition 3.6] implies that

[TABLE]

Furthermore, [23, Proposition 3.7] implies together with equation (10.8) from Lemma 10.4 that for any $\alpha>0$ there is $r>0$ such that

[TABLE]

Combining (10.13) and (10.14) with Lemma 10.3, we conclude that for any $\alpha>0$ for all large enough $r$ we have

[TABLE]

Applying Lemma 10.5 therefore yields

[TABLE]

Moreover, since $\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\boldsymbol{\sigma}^{*})$ and $\hat{\boldsymbol{G}}(n,\boldsymbol{m},P)$ are mutually contiguous by Lemma 3.1, Lemma 10.2 implies that $\lim_{n\to\infty}n^{-1}\mathbb{E}[\ln Z(\hat{\boldsymbol{G}}(n,\boldsymbol{m},P))]=\sup_{\pi\in\mathcal{P}_{*}^{2}(\Omega)}\mathcal{B}(d,P,\pi)$ , too. Finally, since the probability of the event $\mathfrak{S}$ is bounded away from [math] by Proposition 3.11, Lemma 10.2 shows that

[TABLE]

as well. ∎

10.3. Proof of Theorem 2.2

We begin with the observation that $d_{\mathrm{cond}}$ is bounded and bounded away from [math].

Lemma 10.6.

We have $1/(k-1)\leq d_{\mathrm{cond}}<\infty$ .

Proof.

Fix any $d<1/(k-1)$ . Then for any nearly balanced $\sigma:V_{n}\to\Omega$ the expected degree of every variable node of $\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\sigma)$ is $d+o(1)<1/(k-1)$ . Therefore, the well-known result on the ‘giant component’ threshold of a random hypergraph (e.g., [65]) shows that with probability $1-o(1)$ the random factor graph $\boldsymbol{G}^{*}(n,m,P,\sigma)$ consists of connected components of order $O(\ln n)$ , all but a bounded number of which are trees. But assumption SYM guarantees that for every tree factor graph with $n$ variable nodes and $m$ constraint nodes the free energy is precisely equal to $n\ln q+m\ln\xi$ , as is easily verified by induction on the size of the tree. Hence, $n^{-1}\mathbb{E}[\ln Z(\boldsymbol{G}^{*}(n,m,\boldsymbol{m},P,\sigma))]=\ln q+\frac{d}{k}\ln\xi+o(1)$ by Lemma 10.2. Since this formula holds for every nearly balanced assignment $\sigma$ , we obtain $n^{-1}\mathbb{E}[\ln Z(\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\boldsymbol{\sigma}^{*})]=\ln q+\frac{d}{k}\ln\xi+o(1)$ . Hence, Theorem 3.2 shows that $d<d_{\mathrm{cond}}$ and thus $d_{\mathrm{cond}}\geq 1/(k-1)$ .

We move on to the upper bound. Recalling that $\boldsymbol{m}$ has distribution ${\rm Po}(dn/k)$ and that the $\boldsymbol{m}$ constraint nodes in the teacher-student model are chosen independently, we obtain

[TABLE]

Further, plugging in the definition (2.14) of the teacher-student model, we can write the last term out as

[TABLE]

Since the uniformly random $\boldsymbol{\sigma}^{*}$ is nearly balanced with probability $1-o(1)$ as $n\to\infty$ , due to SYM and (2.1) the last expression simplifies to

[TABLE]

Further, due to the third part of (2.1) and because $\Lambda\left({\,\cdot\,}\right)$ is strictly convex, Jensen’s inequality shows that there exists an $n$ -independent number $\alpha>0$ such that

[TABLE]

Combining (10.16)–(10.18), we find $\frac{\partial}{\partial d}\frac{1}{n}\mathbb{E}[\ln\psi_{\boldsymbol{G}^{*}}(\boldsymbol{\sigma}^{*})]\geq k^{-1}(\alpha+\ln\xi)+o(1)$ . Hence, for $d>\frac{k}{\alpha}\ln q$ we obtain

[TABLE]

Hence, applying Theorem 3.2 and recalling (2.4), we conclude that $d_{\mathrm{cond}}\leq\frac{k}{\alpha}\ln q<\infty$ . ∎

We derive Theorem 2.2 from Theorem 3.2 in two steps. First, generalizing the argument from [23, Section 3.5] to the setting of infinite $\Psi$ , we prove the free energy formula for $d\leq d_{\mathrm{cond}}$ .

Proof of Theorem 2.2, part 1..

First assume that $d<d_{\mathrm{cond}}$ is such that for some $\delta>0$ ,

[TABLE]

Then there exists a sequence $m\in\mathcal{M}(d)$ such that

[TABLE]

Hence, Lemma 10.2 shows that for a suitably large $C>0$ and a sufficiently small $\varepsilon>0$ ,

[TABLE]

Now, with $\theta=\theta(\delta,\varepsilon)>0$ chosen small enough, we define

[TABLE]

Theorem 3.2 and Lemma 10.2 yield $\mathbb{P}\left[{\ln Z(\hat{\boldsymbol{G}}(n,m,P))\leq\ln q+\frac{d}{k}\ln\xi+\theta,\,\mathcal{O}(\hat{\boldsymbol{G}}(n,m,P))\leq Cn}\right]=1-o(1)$ because $d<d_{\mathrm{cond}}$ . Therefore, (3.5) and (3.6) yield

[TABLE]

Moreover, the definition (10.20) of $Z^{\prime}(\boldsymbol{G}(n,m,P))$ guarantees that

[TABLE]

But combining (10.21) and (10.22) with the Paley-Zygmund inequality, we obtain

[TABLE]

which contradicts (10.19) if $\theta$ is chosen sufficiently small. Finally, since the probability of the event $\mathfrak{S}$ is bounded away from [math] by Proposition 3.11, the assertion about $\mathbb{E}[\ln Z(\hat{\boldsymbol{G}}(n,m,P))|\mathfrak{S}]$ follows from Lemma 10.2. ∎

We proceed to show that $\limsup_{n\to\infty}\frac{1}{n}\mathbb{E}[\ln Z(\boldsymbol{G})]<\ln q+\frac{d}{k}\ln\xi$ if $d>d_{\mathrm{cond}}$ by generalizing the argument from [23, Section 3.5] to infinite sets $\Psi$ .

Lemma 10.7.

Assume that $d>0$ is such that $\sup_{\pi\in\mathcal{P}_{*}^{2}(\Omega)}\mathcal{B}(d,P,\pi)>\ln q+\frac{d}{k}\ln\xi+\delta$ for some $\delta>0$ . Then for every large enough $C>0$ there exists $\beta=\beta(C)>0$ such that for large enough $n$ ,

[TABLE]

Proof.

If $\sup_{\pi\in\mathcal{P}_{*}^{2}(\Omega)}\mathcal{B}(d,P,\pi)>\ln q+\frac{d}{k}\ln\xi+\delta$ , then Theorem 3.2 shows that

[TABLE]

Fix a small enough $\alpha=\alpha(d,\delta)>0$ and an even smaller $\eta=\eta(\alpha)>0$ and let $\mathcal{S}_{\eta}=\left\{{\sigma\in\Omega^{V_{n}}:\left\|{\rho_{\sigma}-\bar{\rho}}\right\|_{\mathrm{TV}}\leq\eta}\right\}$ . Since $\boldsymbol{\sigma}^{*}\in\Omega^{V_{n}}$ is chosen uniformly and thus $\mathbb{P}[\boldsymbol{\sigma}^{*}\in\mathcal{S}_{\eta}]=1-\exp(-\Omega(n))$ while for large enough $C$ we have $\mathbb{P}\left[{\mathcal{O}(\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\sigma))\leq Cn}\right]=1-o(1)$ by Lemma 10.2, it suffices to prove that for all $\sigma\in\mathcal{S}_{\eta}$ ,

[TABLE]

To establish (10.25) we set up a coupling of $\boldsymbol{G}^{\prime}=\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\sigma)$ , $\boldsymbol{G}^{\prime\prime}=\boldsymbol{G}^{*}(n,\boldsymbol{m},P,\tau)$ for any $\sigma,\tau\in\mathcal{S}_{\eta}$ . Let us write $a_{j}^{\prime}$ for the constraint nodes of $\boldsymbol{G}^{\prime}$ and $a_{j}^{\prime\prime}$ for those of $\boldsymbol{G}^{\prime\prime}$ . Relabeling the variable node as necessary, we may assume without loss that $|\sigma\triangle\tau|\leq 2\eta n$ . Therefore, (2.14) shows that we can couple the distribution of the neighborhoods $\partial a_{j}^{\prime}$ , $\partial a_{j}^{\prime\prime}$ such that, with $\eta>0$ chosen small enough,

[TABLE]

Furthermore, if indeed $\partial a_{j}^{\prime}=\partial a_{j}^{\prime\prime}$ and $\partial a_{j}^{\prime}\cap(\sigma\triangle\tau)=\emptyset$ , then by (2.14) the weight functions $\psi_{a_{j}^{\prime}},\psi_{a_{j}^{\prime\prime}}$ are identically distributed and we couple such that $\psi_{a_{j}^{\prime}}=\psi_{a_{j}^{\prime\prime}}$ . If, on the other hand, $\partial a_{j}^{\prime}\neq\partial a_{j}^{\prime\prime}$ or $(\partial a_{j}^{\prime}\cup\partial a_{j}^{\prime\prime})\cap(\sigma\triangle\tau)\neq\emptyset$ , then we choose $\psi_{a_{j}^{\prime}}$ , $\psi_{a_{j}^{\prime\prime}}$ independently according to (2.14).

Since the $\boldsymbol{m}$ constraint nodes are chosen independently, (10.26) shows that the number $X$ of $j\in[\boldsymbol{m}]$ such that either $\partial a_{j}^{\prime}\neq\partial a_{j}^{\prime\prime}$ or $\psi_{a_{j}^{\prime}}\neq\psi_{a_{j}^{\prime\prime}}$ is binomially distributed with mean at most $\alpha n$ . Hence, $\mathbb{P}\left[{X>2\alpha n}\right]\leq\exp(-\Omega(n))$ . Furthermore, (2.1) shows that the expected impact on the free energy of the $X$ constraint nodes where $\boldsymbol{G}^{\prime},\boldsymbol{G}^{\prime\prime}$ differ is bounded by $cX$ for some number $c=c(P)>0$ that does not depend on $\alpha$ or $\sigma$ . Therefore, choosing $\alpha>0$ small enough we can ensure that

[TABLE]

Combining (10.24) and (10.27), we obtain

[TABLE]

Thus, (10.25) follows from (10.28) and Lemma 10.2. ∎

Lemma 10.8.

Assume that $P$ satisfies SYM and BAL. For any $D>0$ the following is true uniformly for $m\leq Dn/k$ . If $\mathcal{A}$ is an event such that $\mathbb{P}\left[{\boldsymbol{G}^{*}(n,m,P,\boldsymbol{\sigma}^{*})\in\mathcal{A}}\right]\leq\exp(-\Omega(n))$ , then $\mathbb{P}\left[{\hat{\boldsymbol{G}}(n,m,P)\in\mathcal{A}}\right]\leq\exp(-\Omega(n))$ .

Proof.

This is immediate from the Nishimori identity Lemma 4.4 and (4.12). ∎

Proof of Theorem 2.2, part 2..

Suppose that $d>d_{\mathrm{cond}}$ . Then there exist $d^{\prime}<d$ and $\delta>0$ such that

[TABLE]

Let $\boldsymbol{m}^{\prime}=\boldsymbol{m}_{d^{\prime}}(n)$ be a ${\rm Po}(d^{\prime}n/k)$ -variable and consider the event $\mathcal{F}=\{n^{-1}\ln Z\leq\ln q+\frac{d^{\prime}}{k}\ln\xi+\delta/2\}$ . Then Markov’s inequality and Lemma 4.6 yield

[TABLE]

On the other hand, Lemma 10.7 shows that for large enough $C>0$ ,

[TABLE]

Now, for a factor graph $G$ obtain $G^{\prime}$ by removing each constraint node with probability $1-d^{\prime}/d$ independently. Moreover, let $\mathcal{G}$ be the set of all factor graphs $G$ such that $\mathbb{P}[G^{\prime}\in\mathcal{F}]\geq 1/2$ , where, of course, the probability is over the removal process only. Since the distribution of $\boldsymbol{G}(n,\boldsymbol{m},P)^{\prime}$ is identical to that of $\boldsymbol{G}(n,\boldsymbol{m}^{\prime},P)$ , (10.29) yields

[TABLE]

Similarly, $\boldsymbol{G}^{*}(n,\boldsymbol{m},p,\boldsymbol{\sigma}^{*})^{\prime}$ and $\boldsymbol{G}^{*}(n,\boldsymbol{m}^{\prime},p,\boldsymbol{\sigma}^{*})$ are identically distributed. Thus, (10.30) and Lemma 10.1 imply that

[TABLE]

Furthermore, (10.32) and Lemma 10.8 yield $\chi>0$ such that

[TABLE]

To complete the proof, assume for contradiction that $\limsup_{n\to\infty}n^{-1}\mathbb{E}[\ln Z(\boldsymbol{G}(n,\boldsymbol{m},P))]\geq\ln q+\frac{d}{k}\ln\xi$ . Then $n^{-1}\mathbb{E}[\ln Z(\boldsymbol{G}(n,\boldsymbol{m},P))]\geq\ln q+\frac{d}{k}\ln\xi+o(1)$ for arbitrarily large $n$ . Thus, we can apply Lemma 10.2 to conclude that for infinitely many $n$ ,

[TABLE]

Combining (10.34) with Lemma 10.1, we see that the event $\mathcal{A}=\{n^{-1}\ln Z<\ln q+\frac{d}{k}\ln\xi-\chi,\,\mathcal{O}\leq Cn\}$ satisfies $\mathbb{P}\left[{\boldsymbol{G}(n,\boldsymbol{m},P)\in\mathcal{A}}\right]=1-o(1)$ for arbitrarily large $n$ . But then

[TABLE]

a contradiction that refutes the assumption $\limsup_{n\to\infty}n^{-1}\mathbb{E}[\ln Z(\boldsymbol{G}(n,\boldsymbol{m},P))]\geq\ln q+\frac{d}{k}\ln\xi$ . ∎

11. Reconstruction

Throughout this section, when there is no danger of confusion we abbreviate $\boldsymbol{T}(d,P)$ to $\boldsymbol{T}$ and $\boldsymbol{T}^{h}(d,P)$ to $\boldsymbol{T}^{h}$ . For a rooted factor tree $T$ and any vertex $x$ in that tree, let $\partial_{desc}x$ denote the children of $x$ . Also, for any factor graph $G$ , any variable node $v$ in this graph and any integer $\ell\geq 0$ , we let $S(v,\ell)$ denote the set of variable nodes at distance $2\ell$ from $v$ .

Given some graph $G=(V,E)$ , any $M\subset V$ and an assignment $\sigma\in\Omega^{V}$ let $\sigma(M)$ , or $\sigma_{M}$ denote the assignment that $\sigma$ specifies for the set $M$ Furthermore, let $\nu,\nu^{\prime}$ be two distribution on the configuration space $\Omega^{V}$ . For any $M\subset V$ we let

[TABLE]

denote the total variation distance between the projections of $\nu$ and $\nu^{\prime}$ on $M$ . Also, for some $\sigma\in\Omega^{V}$ we let $\nu^{\sigma_{M}}$ denote the distribution $\nu$ conditional on that $M$ has assignment $\sigma(M)$ .

For the factor tree $T$ we define the broadcasting process which generates an assignment $\mathbold{\sigma}\in\Omega^{V}_{T}$ as follows: There is some initial distribution $\zeta\in\mathcal{P}(\Omega)$ . We set $\mathbold{\sigma}(r)$ according to the distribution $\zeta$ . Then, inductively, assume that we have $\mathbold{\sigma}(x)$ for some variable node $x$ . For each $\alpha\in\partial_{desc}x$ , independently, the variables nodes in $\partial\alpha$ are assigned $\tau\in\Omega^{k}$ with probability proportional to

[TABLE]

where $\psi_{\alpha}$ is the weight function that corresponds to $\alpha$ and $j_{\alpha,x}$ is the position of $x$ inside the constraint $\psi_{\alpha}$ .

Lemma 11.1.

Consider some factor tree $T$ of height $h>0$ , rooted at (variable) node $r$ . Let $\mathbold{\sigma}\in\Omega^{T}$ be the assignment generated by the broadcasting process such that the initial distribution is the uniform over $\Omega$ .

For any $\tau\in\Omega^{T}$ , it holds that

[TABLE]

where $\mu_{T}$ is the Gibbs distribution specified by $T$ .

Proof.

Let $\mathbold{\eta}$ be distributed as in $\mu_{T}$ . Then, we have that $\mathbold{\eta}(r)$ is distributed uniformly at random in $\Omega$ .

Furthermore, let $x\in T$ be a variable node. Given $\mathbold{\eta}(x)$ for each $\alpha\in\partial_{desc}x$ the assignment $\mathbold{\eta}(\partial\alpha)$ is independent of the other vertices in $\partial_{desc}x$ . Furthermore, for each assignment $\tau\in\Omega^{k}$ we have $\mathbold{\eta}(\partial\alpha)=\tau$ with probability proportional to

[TABLE]

The lemma follows by using the definition of the broadcasting process. ∎

Consider a sequence of factor trees $\mathcal{T}=\{T_{\ell}\}_{\ell\geq 0}$ , where $T_{h}$ contains $h$ levels of variable nodes. Let

[TABLE]

recall that $S(r,2\ell)$ is the set of variable nodes at distance $2\ell$ from the root $r$ . Similarly, we define

[TABLE]

We study the reconstruction problem on the sequence of factor tree $\mathcal{T}$ by means of the broadcasting processes and the quantity $\mathrm{broad}_{\mathcal{T}}$ . To be more specific, for each $T_{\ell}\in\mathcal{T}$ , rooted at $r_{\ell}$ , consider two broadcasting processes with some initial distribution $\zeta$ and let $\mathbold{\sigma}_{\ell}$ and $\mathbold{\tau}_{\ell}$ be the assignment s that are generated, respectively. Then, the quantity $\mathrm{broad}_{\mathcal{T}}$ expresses the $\ell_{1}$ -distance between the distributions of the configurations $\mathbold{\sigma}_{\ell}(S(r_{\ell},\ell))$ and $\mathbold{\tau}_{\ell}(S(r_{\ell},\ell))$ , as $\ell\to\infty$ , conditional that $\mathbold{\sigma}_{\ell}(r_{\ell})=c$ , $\mathbold{\tau}_{\ell}(r_{\ell})=c^{\prime}$ , for worst-case pair $c,c^{\prime}\in\Omega$ . The following result implies that for studying reconstruction on $\mathcal{T}$ we can either consider $\mathrm{broad}_{\mathcal{T}}$ , or $\mathrm{corr}_{\mathcal{T}}$ .

Lemma 11.2.

Let $\mathcal{T}=\{T_{\ell}\}_{\ell\geq 0}$ be a sequence of factor trees, where $T_{\ell}$ contains $\ell$ levels of variable nodes. Then we have that $\mathrm{broad}_{\mathcal{T}}=0$ if and only if $\mathrm{corr}_{\mathcal{T}}=0$ .

Proof.

For some integer $\ell>0$ , we have that

[TABLE]

Clearly, the above implies that $\mathrm{broad}_{\mathcal{T}}\leq q\ \mathrm{corr}_{\mathcal{T}}$ . In turn, we get that if $\mathrm{corr}_{\mathcal{T}}=0$ , then $\mathrm{broad}_{\mathcal{T}}=0$ , as well.

We work in a similar way for the other direction. That is,

[TABLE]

Clearly, the above implies that $\mathrm{corr}_{\mathcal{T}}\leq 2\ \mathrm{broad}_{\mathcal{T}}$ . In turn, we get that if $\mathrm{broad}_{\mathcal{T}}=0$ , then $\mathrm{corr}_{\mathcal{T}}=0$ . ∎

In the following result we show that that non-reconstruction is monotone in the expected degree of $\boldsymbol{T}(d,P)$ . In particular we show the following result.

Lemma 11.3.

For any $d_{1},d_{2}>0$ such that $d_{1}\geq d_{2}$ , the following is true: If $\mathrm{corr}^{\star}(d_{1})=0$ , then $\mathrm{corr}^{\star}(d_{2})=0$ .

The proof of Lemma 11.3 appears in Section 11.1

We proceed by introducing some further notions. For a rooted factor graph $G$ , let ${\tt ISM}(G)$ be the isomorphism class of rooted factor graphs to which $G$ belongs. Let $\boldsymbol{T}_{\boldsymbol{G},\ell}(v)$ be the induced subgraph of $\boldsymbol{G}$ which includes $v$ and all variable nodes which are within graph distance $2\ell$ from $v$ . For $h=o(\log n)$ , $\boldsymbol{T}_{G,h}(v)$ is a tree with probability $1-o(1)$ . In particular, there is a coupling $\rho$ of the distribution induced by $\boldsymbol{T}_{G,h}(v)$ and $\boldsymbol{T}^{h}$ such that the following is true:

[TABLE]

For what follows, we let the event $\mathcal{I}(v,h)=\{\boldsymbol{1}\{{\tt ISM}(\boldsymbol{T}_{\boldsymbol{G},h}(v))={\tt ISM}(\boldsymbol{T}^{h})\}$ .

Lemma 11.4.

Let $h=o(\log n)$ . Consider $(\boldsymbol{G}^{*},\mathbold{\sigma}^{*})$ generated according to Teacher-Student model and some vertex $v$ . Also, consider the pair $(\boldsymbol{T}^{h},\mathbold{\tau})$ such that $\mathbold{\tau}$ is generated by a broadcasting process for which we assign the root $r$ the configuration ${\mathbold{\sigma}}(v)$ with probability 1.

There is a coupling $\tilde{\lambda}$ between $(\boldsymbol{G}^{*},\mathbold{\sigma}^{*})$ and $(\boldsymbol{T}^{h},\mathbold{\tau})$ such that the following is true:

[TABLE]

where $f$ is an isomorphism between $\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ and $\boldsymbol{T}^{h}$ . The same result holds for $\boldsymbol{G}^{*}\in\mathfrak{S}$ .

The proof of Lemma 11.4 appears in Section 11.2.

In light of Lemma 11.4 and (11.3) Theorem 2.9 is immediate.

The above result implies that in the teacher-student model, the distribution of the configuration of $\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ that is specified by $\mathbold{\sigma}^{*}$ is asymptotically the same as the distribution of the configuration that is induced by the broadcasting process on $\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ . We use the above result with Corollary 2.7 to relate reconstruction on random factor graph $\boldsymbol{G}$ and random tree $\boldsymbol{T}$ .

Now we proceed with the proof of Theorem 2.8. In the following lemma we provide the upper-bound for $d_{\mathrm{rec}}$ and $d_{\mathrm{rec}}^{\star}$ .

Lemma 11.5.

For any $\varepsilon>0$ there exists $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+\varepsilon$ such that $\mathrm{corr}(d)>0$ . Furthermore, for any $d>d_{\mathrm{cond}}$ we have $\mathrm{corr}^{\star}(d)$ .

Proof.

We consider $\mathrm{corr}(d)$ . For any graph $G$ and two vertices $x,y$ such that $\textrm{dist}(x,y)\geq\ell$ and any $c\in\Omega^{\{x\}}$ , it is easy to see that

[TABLE]

Furthermore, working as in Lemma 11.2, we can substitute the r.h.s. of the above inequality and get

[TABLE]

For any two fixed vertices $x,y$ in $\boldsymbol{G}$ , we denote by $\mathcal{D}(x,y)$ the event that $\textrm{dist}(x,y)\geq\ln\ln n$ . Then, for $h=\ln\ln n$ we get that

[TABLE]

Note that for any two fixed vertices $x,y$ it holds that $\mathbb{P}[\mathcal{D}^{c}(x,y)]\leq n^{-1/2}$ . To see this, let $N_{x}$ be the number of vertices within distance $\ln\ln n$ from $x$ . Furthermore, given $N_{x}$ each vertex belongs to the $\ln\ln n$ neighborhood of $x$ with probability at most $N_{x}/n$ . Then, noting that $\mathbb{E}[N_{x}]=o(n^{1/100})$ , we get that

[TABLE]

where we use Markov’s inequality to bound $\mathbb{P}[N_{x}>n^{1/3}]$ . Combining all the above, we get that for any $d$ it holds that

[TABLE]

We conclude the part for $\textrm{corr}(d)$ by combining the above with (2.12). Recall that the later states that for any $\varepsilon$ there exists $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+\varepsilon$ such that the l.h.s. is strictly positive.

Repeating the same arguments as above we get that for $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+1$ it holds that

[TABLE]

where $\textrm{corr}^{*}(d)$ is defined in (2.16).

Note that the l.h.s is bounded away from zero. To see this note that if it were zero, then it would have implied that for $d>d_{\mathrm{cond}}$ we get that $\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\ln Z(\boldsymbol{G}^{*})]=\frac{\ln\xi}{k}d+\ln q$ . Clearly, this is not true, e.g. see Corollary 6.3 and Theorem 3.2. Then we conclude that $\textrm{corr}^{*}(d)>0$ for $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+1$ .

Using Lemma 11.4 we get that $\textrm{corr}^{\star}(d)>0$ for any $d>d_{\mathrm{cond}}$ as well. To be more specific, note that Lemma 11.4 implies the following: Let $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+1$ . Also consider the pair $(\boldsymbol{G}^{*},{\mathbold{\sigma}}^{*})$ and $(\boldsymbol{T},\mathbold{\tau})$ . For any $h=o(\log n)$ , there is a coupling between $(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v),{\mathbold{\sigma}}^{*}(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)))$ and $(\boldsymbol{T},\mathbold{\tau})$ such that with probability $1-o(1)$ we have ${\tt ISM}(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v))={\tt ISM}(\boldsymbol{T}^{h})$ , with some isomorphism $f(\cdot)$ . Furthermore, for every $u\in\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ we have that $\hat{\mathbold{\sigma}}(u)=\mathbold{\tau}(f(u))$ . This coupling implies that $\textrm{corr}^{*}(d)=\textrm{corr}^{\star}(d)$ . That is, for $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+1$ we have $\textrm{corr}^{\star}(d)>0$ . Then, using the monotonicity result from Lemma 11.3 we get that for any $d>d_{\mathrm{cond}}$ we have $\textrm{corr}^{\star}(d)>0$ . The lemma follows. ∎

In light of Lemma 11.5, we get the first part of Theorem 2.8 by using the following result.

Lemma 11.6.

For any $d<d_{\mathrm{rec}}^{\star}$ we have that $\mathrm{corr}^{\star}(d)=\mathrm{corr}(d)=0$ . Furthermore, for $d_{\mathrm{rec}}^{\star}<d<d_{\mathrm{cond}}$ we have that $\mathrm{corr}^{\star}(d),\mathrm{corr}(d)>0$ .

The proof of Lemma 11.6 appears in Section 11.3.

As far as the the second part of Theorem 2.8 is concerned essentially it follows as a corollary from all the previous results in this section. It is elementary to verify that

[TABLE]

Using Lemma 8.1 we get that $\mathbb{P}[\mathfrak{S}]=\Omega(1)$ . Then, using Lemma 11.6 we get that for any $d<d_{\mathrm{rec}}^{\star}$ the l.h.s. of (11.8) is equal to zero. We proceed by showing that for any $\varepsilon>0$ there exists $d_{\mathrm{cond}}<d<d_{\mathrm{cond}}+\varepsilon$ such that

[TABLE]

Using Theorem 2.5 and standard arguments e.g. (e.g., [13, Section 2]) there is $\varepsilon>0$ such that

[TABLE]

Then (11.9) follows by working as in the proof of Lemma 11.5. Finally, we show that for $d_{\mathrm{rec}}^{\star}<d<d_{\mathrm{cond}}$ we have

[TABLE]

For showing the above, we work as in the second case of Lemma 11.6, i.e. we use Lemma 11.4 and the contiguity result in Corollary 2.7. More specifically, if there is $d_{\mathrm{rec}}^{\star}<d<d_{\mathrm{cond}}$ such that the l.h.s. of (11.10) is zero, then Corollary 2.7 would imply that

[TABLE]

recall that $\mu_{\boldsymbol{G}^{*}}$ is the distribution over configurations in $\Omega^{V_{n}}$ that is induced by $\mathbold{\sigma}^{*}$ conditional on $\boldsymbol{G}^{*}$ . If the above was true, then Lemma 11.4 would imply that $\textrm{corr}^{\star}(d)=0$ . Clearly this is a contradiction due to Lemma 11.6.

The theorem follows.

11.1. Proof of Lemma 11.3

Consider two factor trees $T_{1}$ and $T_{2}$ with roots $r_{1},r_{2}$ , respectively. We say that $T_{1},T_{2}$ satisfy the relation $T_{1}\subseteq T_{2}$ if there is an injective mapping $f:V(T_{1})\cup F(T_{1})\to V(T_{2})\cup F(T_{2})$ such that the following is true: for every $v\in V(T_{1})$ we have $\partial_{desc}v\subseteq\partial_{desc}f(v)$ , while every $\alpha\in F$ such that $\alpha\in\partial_{desc}v\cap\partial_{desc}f(v)$ is assigned the same weight function $\psi_{\alpha}$ in both trees and $v$ , $f(v)$ occupy the same position within $\psi_{\alpha}$ . Furthermore, for every function node $\alpha\in F(T_{1})$ we have $\partial_{desc}\alpha=\partial_{desc}f(\alpha)$ and every $w\in\partial_{desc}\alpha$ occupies in $\psi_{\alpha}$ the same position as $f(w)$ in $f(\alpha)$ .

Lemma 11.7.

Consider two sequences of factor trees $\mathcal{T}_{1}$ and $\mathcal{T}_{2}$ such the the following is true: For $T^{1}_{\ell}\in\mathcal{T}_{1}$ and $T^{2}_{\ell}\in\mathcal{T}_{2}$ we have $T^{1}_{\ell}\subseteq T^{2}_{\ell}$ , for $\ell=1,2,\ldots$ Then, we have that

[TABLE]

Proof.

For some $\ell\geq 0$ , consider $T^{1}_{\ell}\in\mathcal{T}_{1}$ and $T^{2}_{\ell}\in\mathcal{T}_{2}$ . Since we assumed that $T^{1}_{\ell}\subseteq T^{2}_{\ell}$ , let $h:V(T^{1}_{\ell})\cup F(T^{1}_{\ell})\to V(T^{2}_{\ell})\cup F(T^{2}_{\ell})$ be the mapping that verifies that property.

For any two $s,c\in\Omega$ consider $\mathbold{\tau}_{1},\mathbold{\sigma}_{1}$ two configurations generated by the broadcasting process on $T^{1}_{\ell}$ such that $\mathbold{\tau}_{1}=s$ and $\mathbold{\sigma}_{1}=c$ . Similarly, let $\mathbold{\tau}_{2},\mathbold{\sigma}_{2}$ two configurations generated by the broadcasting process on $T^{2}_{\ell}$ such that $\mathbold{\tau}_{2}=s$ and $\mathbold{\sigma}_{2}=c$ . Then it suffices to show the following: For any $\alpha\in[0,1]$ , if there is a coupling $\xi_{2}$ for $\mathbold{\sigma}_{2},\mathbold{\tau}_{2}$ such that the probability that $\mathbold{\sigma}_{2}(S(r,2\ell))\neq\mathbold{\tau}_{2}(S(r,2\ell))$ is equal to $\alpha$ , then there exists a coupling $\xi_{1}$ for $\mathbold{\sigma}_{1},\mathbold{\tau}_{1}$ such that the probability that $\mathbold{\sigma}_{1}(S(r,2\ell))\neq\mathbold{\tau}_{1}(S(r,2\ell))$ is at most $\alpha$ .

From the definition of the broadcasting process, we get the following: Let $\mathbold{\sigma}_{1}$ and $\mathbold{\sigma}_{2}$ be two configurations generated by broadcasting process on $T^{1}_{\ell}$ and $T^{2}_{\ell}$ , respectively, such that $\mathbold{\sigma}_{1}(r)=\mathbold{\sigma}_{2}(r)=c$ , for some $c\in\Omega$ . Then there is a coupling $\zeta$ for $\mathbold{\sigma}_{1}$ , $\mathbold{\sigma}_{2}$ such that for every $v\in V(T^{1}_{\ell})$ , we have that $\mathbold{\sigma}_{1}(v)=\mathbold{\sigma}_{2}(h(v))$ .

Assume that we have the coupling $\xi_{2}$ for $\mathbold{\sigma}_{2}$ and $\mathbold{\tau}_{2}$ . We combine couplings $\xi_{2}$ and $\zeta$ to get $\xi_{1}$ . In particular we use the couplings as follows: First, we couple $\mathbold{\sigma}_{1}$ and $\mathbold{\sigma}_{2}$ by using $\zeta$ . Then, we use $\xi_{2}$ to couple $\mathbold{\sigma}_{2}$ and $\mathbold{\tau}_{2}$ . Finally, we use $\zeta$ to couple $\mathbold{\tau}_{2}$ and $\mathbold{\tau}_{1}$ .

In the above “chain of couplings", note that we have $\mathbold{\sigma}_{1}(S(r,2\ell))\neq\mathbold{\tau}_{1}(S(r,2\ell))$ only if $\mathbold{\sigma}_{2}(S(r,2\ell))\neq\mathbold{\tau}_{2}(S(r,2\ell))$ . This implies that if in $\xi_{2}$ the probability of the event $\mathbold{\sigma}_{2}(S(r,2\ell))\neq\mathbold{\tau}_{2}(S(r,2\ell))$ is equal to $\alpha$ , then in $\xi_{1}$ the probability of having $\mathbold{\sigma}_{1}(S(r,2\ell))\neq\mathbold{\tau}_{1}(S(r,2\ell))$ is at most $\alpha$ . The lemma follows. ∎

In light of Lemmas 11.2, 11.7 we get the following corollary.

Corollary 11.8.

Consider two sequences of factor trees $\mathcal{T}_{1}$ and $\mathcal{T}_{2}$ such that for $T^{1}_{\ell}\in\mathcal{T}_{1}$ and $T^{2}_{\ell}\in\mathcal{T}_{2}$ we have $T^{1}_{\ell}\subseteq T^{2}_{\ell}$ , for $\ell=1,2,\ldots$ , then the following is true: If $\mathrm{corr}_{\mathcal{T}_{2}}=0$ , then $\mathrm{corr}_{\mathcal{T}_{1}}=0$ .

The lemma follows by using the above corollary and noting that for any $d_{1},d_{2}>0$ such that $d_{1}\geq d_{2}$ there is a standard coupling such that $\boldsymbol{T}(d_{2},P)\subseteq\boldsymbol{T}(d_{1},P)$ .

11.2. Proof of Lemma 11.4

The case where $\boldsymbol{G}^{*}\in\mathfrak{S}$ is almost identical to the case where we don’t restrict $\boldsymbol{G}^{*}$ . For this reason we omit the proof of the case where $\boldsymbol{G}^{*}\in\mathfrak{S}$ .

Let the pairs $(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v),\mathbold{\sigma}^{*})$ and $(\boldsymbol{T}^{h},\mathbold{\tau})$ . Then, we define the relation “ $\cong$ " such that $(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v),\mathbold{\sigma}^{*})\cong(\boldsymbol{T}^{h},\mathbold{\tau})$ if the following holds: $\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ and $\boldsymbol{T}^{h}$ belong to the same isomorphism class of rooted trees, where $\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ is rooted at $v$ and $\boldsymbol{T}^{h}$ is rooted at $r$ . Furthermore, if $f$ is an isomorphism between the two trees, then for every $u\in\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ we have that ${\mathbold{\sigma}}^{*}(u)=\mathbold{\tau}(f(u))$ . We are going to show a coupling $\tilde{\lambda}$ that has the property that

[TABLE]

For what follows, we denote $f$ the isomorphism $\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)$ and $\boldsymbol{T}^{h}$ , if such exists.

Before proceeding let us state some, easy to prove results. Recall that for an assignment $\sigma$ on $n$ vertices we denote by $\mu_{\sigma}=n^{-1}(|\sigma^{-1}(i)|)_{i\leq q}$ its empirical marginal distribution. Furthermore, it is elementary to show that

[TABLE]

Let $\mathbold{m}$ be the number of edges in $\boldsymbol{G}^{*}$ . Recall that $\mathbold{m}$ is a random variable which is distributed as in Poisson with parameter $dn/k$ . Applying standard Chernoff’s bounds for $\mathbold{m}$ we get that

[TABLE]

We let $|\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v)|$ denote the number of vertices in $\boldsymbol{T}_{\boldsymbol{G}^{*},h}(n)$ . Note that for every variable node $x\in\boldsymbol{T}_{G^{*},h}(v)$ , the cardinality of $\partial_{desc}x$ is dominated by the Poisson distribution with parameter $d$ . With this observation we get that

[TABLE]

The coupling $\tilde{\lambda}$ is as follows: If ${\mathbold{\sigma}^{*}}$ is such that $\|\mu_{\boldsymbol{\sigma}^{*}}-q^{-1}\boldsymbol{1}\|>(\sqrt{n})^{-1}\ln n$ or $|\mathbold{m}-dn/k|>n^{2/3}$ we don’t couple $(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v),\mathbold{\sigma}^{*})$ and $(\boldsymbol{T}^{h},\mathbold{\tau})$ at all. Otherwise, the coupling $\tilde{\lambda}$ is defined inductively.

First consider the coupling between ${\mathbold{\sigma}^{*}}(v)$ and $\mathbold{\tau}(r)$ . Note that $f(v)=r$ . Due to our assumption about $\mu_{\mathbold{\sigma}^{*}}$ , we can have $\tilde{\lambda}$ such that

[TABLE]

The above follows by using a maximal coupling for choosing ${\mathbold{\sigma}^{*}}(v),\mathbold{\tau}(r)$ .

The induction step is as follows: Assume that we have exposed partly $(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v),\mathbold{\sigma}^{*})$ and $(\boldsymbol{T}^{h},\mathbold{\tau})$ and the corresponding parts agree. That is, let $(\boldsymbol{T}_{1},\mathbold{\sigma}_{1})$ and $(\boldsymbol{T}_{2},\mathbold{\sigma}_{2})$ be the two parts of $(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v),\hat{\mathbold{\sigma}})$ and $(\boldsymbol{T}^{h},\mathbold{\tau})$ , respectively. Our assumption is that $(\boldsymbol{T}_{1},\mathbold{\sigma}_{1})\cong(\boldsymbol{T}_{2},\mathbold{\sigma}_{2})$ . W.l.o.g. assume that the leaves of the trees are variable nodes.

Let $x$ be a leaf in $\boldsymbol{T}_{1}$ whose descendants have not been revealed so far. The same holds for $f(x)$ in $\boldsymbol{T}_{2}$ . Let $\mathbold{m}_{x}$ be the number of hyper-edges of $G^{*}$ that have revealed so far. Recall that the number of all hyper-edges in $G^{*}$ is $\mathbold{m}$ . Then, it is an easy calculation to get that for any $j$ we have

[TABLE]

If $r$ is the number of edges of the tree we have revealed up to vertex $x$ , then we have the crude upper bound that $\mathbold{m}_{x}\leq r$ . We have that

[TABLE]

where the third inequality follows from Markov’s inequality and the last inequality follows from our assumption that $h=o(\log n)$ . Combining the two above relations we get the following: For any $0\leq j\leq\ln^{2}n$ it holds that

[TABLE]

Similarly we get that $\mathbb{P}\left[|\partial_{desc}x|>\ln^{2}n\right]=o(n^{-10})$ .

Recall that for a vertex $u\in\boldsymbol{T}^{h}$ we have that $|\partial_{desc}u|$ is distributed as in Poisson with parameter $d$ . Using this observation we can have $\tilde{\lambda}$ such that

[TABLE]

We extend $f$ by defining a bijection between $\partial_{desc}x$ and $\partial_{desc}f(x)$ . From the definition of $\boldsymbol{G}^{*}$ we get that each $\alpha\in\partial_{desc}v$ chooses a weight function $\psi\in\Psi$ from a distribution which is within total variation distance $O(n^{-1/2}\ln n)$ from $P$ . Note that the term $O(n^{-1/2}\ln n)$ comes from the fact that $\mathbold{\sigma}^{*}$ is not perfectly balanced, i.e. we allow some fluctuations $O(\sqrt{n}\ln n)$ on the sizes of the color classes. For $f(\alpha)$ we have that it chooses its weight function $\psi$ with probability $P(\psi)$ . The above observations imply that we can have $\tilde{\lambda}$ such that

[TABLE]

By choosing the same weight function $\psi_{\alpha}$ for both $\alpha$ and $f(\alpha)$ we imply that the position of $x$ and $f(x)$ is the same in the two functions.

Finally, for every pair of constraint nodes $\alpha$ and $f(\alpha)$ for which we have chosen the weight function $\psi_{\alpha}$ we decide on ${\mathbold{\sigma}^{*}}(y_{i})$ and $\mathbold{\tau}(z_{i})$ , where $y_{i}\in\partial\alpha\setminus\{x\}$ and $z_{i}=f(x_{i})$ . For each configuration $\tau\in\Omega^{k}$ we have $\hat{\mathbold{\sigma}}(\partial\alpha)=\tau$ with probability proportional to

[TABLE]

where $j_{\alpha,x}$ is the position of $x$ inside the constraint $\psi_{\alpha}$ . Also, we have $\mathbold{\tau}(\partial f(\alpha))=\tau$ with probability proportional to

[TABLE]

From the above, it is clear that we can have $\tilde{\lambda}$ such that

[TABLE]

Let $(\boldsymbol{T}^{\prime}_{1},\mathbold{\sigma}^{\prime}_{1})$ and $(\boldsymbol{T}^{\prime}_{2},\mathbold{\sigma}^{\prime}_{2})$ be the new parts of of $(\boldsymbol{T}_{\boldsymbol{G}^{*},h}(v),{\mathbold{\sigma}}^{*})$ and $(\boldsymbol{T}^{h},\mathbold{\tau})$ , after the revelation of $\partial_{desc}x,\partial_{desc}f(x)$ and $\partial_{desc}\alpha,\partial_{desc}f(a)$ , for every $\alpha\in\partial_{desc}x$ and for every $f(\alpha)\in\partial_{desc}f(x)$ .

Then, using all the above and a simple union bound gives that

[TABLE]

The law of total probability implies that

[TABLE]

Lemma 11.4 follows by bounding appropriately the number of steps required for the coupling. Let $\mathcal{A}$ be the event that the number of steps in the coupling is more than $n^{1/10}$ . Since the number of steps of the coupling is upper bounded by the number of vertices of $\boldsymbol{T}_{G^{*},h}(v)$ , using (11.13) and Markov’s inequality we get that

[TABLE]

We have that

[TABLE]

The above implies that (11.11) is indeed true. The lemma follows.

11.3. Proof of Lemma 11.6

Clearly, Lemma 11.3 implies that we have $\mathrm{corr}^{\star}(d)=0$ if and only if $d<d_{\mathrm{rec}}^{\star}$ . To see this note the following: Assume that there is $d_{0}>d_{\mathrm{rec}}^{\star}$ such that $\mathrm{corr}^{\star}(d_{0})=0$ . Then Lemma 11.3 implies that since $d_{0}>d_{\mathrm{rec}}^{\star}$ and $\mathrm{corr}^{\star}(d_{0})=0$ , then we also have $\mathrm{corr}^{\star}(d_{\mathrm{rec}}^{\star})=0$ , which is false.

For proving Lemma 11.6, it remains to show that $\mathrm{corr}(d)=0$ if and only if $d>d_{\mathrm{rec}}^{\star}$ . First we focus on showing that for $d<d_{\mathrm{rec}}^{\star}$ we have

[TABLE]

For even integer $\ell>0$ consider the factor tree $T_{\ell}$ which contains $\ell$ levels of variable nodes and it is rooted at $r$ . The configuration $\eta\in\Omega^{S(r,\ell)}$ is called “ $(\ell,\delta)$ -mixing", for some $\delta\geq 0$ , if it holds that

[TABLE]

Let $\mathcal{M}(T_{\ell},\ell,\delta)$ be the set of all configurations which are $(\ell,\delta)$ -mixing for $T_{\ell}$ . The above quantity expresses the correlation between the configuration of the vertices at distance $2\ell$ and the root $r$ (set to vertex correlation).

Eq. (11.21) follows by showing the following result.

Lemma 11.9.

For $d<d_{\mathrm{rec}}^{\star}$ and every $\delta>0$ there exists $\ell_{0}=\ell_{0}(\delta)$ such that for any even $\ell\geq\ell_{0}$ we have

[TABLE]

Proof.

We shift our attention to considering the teacher-student pair $(\boldsymbol{G}^{*},\mathbold{\sigma}^{*})$ . In light of Corollary 4.9, it suffices to show the following: For $d<d_{\mathrm{rec}}^{\star}$ and every $\varepsilon>0$ there exists $\ell_{0}=\ell_{0}(\varepsilon)$ such that for any $\ell\geq\ell_{0}$ we have

[TABLE]

In light of Lemma 11.4, for (11.23) it suffices to show the following result: For any $d<d_{\mathrm{rec}}^{\star}$ and any $\varepsilon>0$ there exists $\ell_{0}=\ell_{0}(\varepsilon)$ such that

[TABLE]

Clearly the above follows from the definition of $d_{\mathrm{rec}}^{\star}$ . ∎

From Lemma 11.9 we get (11.21) by working as follows: Let

[TABLE]

Furthermore, for any $\delta>0$ , integer $\ell$ , for $\boldsymbol{G}$ , for any vertex $v$ and $\mathbold{\sigma}$ distributed as in Gibbs measure, let $\mathcal{G}=\mathcal{G}(v,\ell,\delta)$ be the event that $\mathbold{\sigma}\in\mathcal{M}(\boldsymbol{T}_{\boldsymbol{G},\ell}(v),\ell,\delta)$ . Lemma 11.9 implies that for $d<d_{\mathrm{rec}}^{\star}$ , for every $\delta>0$ there exists $\ell_{0}=\ell_{0}(\delta)$ such that for any $\ell\geq\ell_{0}$ the following holds:

[TABLE]

Noting that $\textrm{corr}(d)=\limsup_{\ell\to\infty}\limsup_{n\to\infty}n^{-1}\sum_{v\in V_{n}}\mathrm{corr}_{v,\ell}(d)$ , we get that (11.21) is indeed true.

We conclude the proof of the Lemma 11.6 by showing that for $d>d_{\mathrm{rec}}^{\star}$ we have

[TABLE]

The proof of (11.24) is by contradiction. Assume that there exists $d_{\mathrm{rec}}^{\star}<d$ such that $\textrm{corr}(d)=0$ , this would entail that (11.22) is true. Then, reversing the arguments from the proof of Lemma 11.9, and combining them Corollary 4.9, we get that for any $\varepsilon>0$ there exists $\ell_{0}=\ell_{0}(\varepsilon)$ such that for any $\ell>\ell_{0}$ we have

[TABLE]

The above implies that $\textrm{corr}^{\star}(d)=0$ . Clearly we get a contradiction since we have shown in Lemma 11.3 that for every $d>d_{\mathrm{rec}}^{\star}$ we have $\textrm{corr}^{\star}(d)>0$ .

Acknowledgment. We thank Will Perkins, Guilhem Semerjian and Nick Wormald for helpful discussions.

Bibliography67

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Abbe: Community detection and stochastic block models: recent developments. ar Xiv:1703.10146 (2017).
2[2] E. Abbe, A. Montanari: Conditional random fields, planted constraint satisfaction and entropy concentration. Theory of Computing 11 (2015) 413–443.
3[3] E. Abbe, C. Sandon: Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap. ar Xiv:1512.09080 (2015).
4[4] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802.
5[5] D. Achlioptas, H. Hassani, N. Macris, R. Urbanke: Bounds for random constraint satisfaction problems via spatial coupling. Proc. 27th SODA (2016) 469–479.
6[6] D. Achlioptas, C. Moore: Random k 𝑘 k -SAT: two moments suffice to cross a sharp threshold. SIAM Journal on Computing 36 (2006) 740–762.
7[7] D. Achlioptas, C. Moore: On the 2-colorability of random hypergraphs. Proc. 6th RANDOM (2002) 78–90.
8[8] D. Achlioptas, A. Naor: The two possible values of the chromatic number of a random graph. Annals of Mathematics 162 (2005) 1333–1349.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Charting the replica symmetric phase

Abstract.

1. Introduction

1.1. The cavity method

1.2. The diluted kkk-spin model

Theorem 1.1**.**

Theorem 1.2**.**

1.3. The Potts antiferromagnet

Theorem 1.3**.**

Theorem 1.4**.**

Corollary 1.5**.**

1.4. The stochastic block model

Theorem 1.6**.**

2. Main results

2.1. Factor graphs

Definition 2.1**.**

2.2. Results

2.2.1. The condensation phase transition

Theorem 2.2**.**

Theorem 2.3**.**

2.2.2. The free energy

Theorem 2.4**.**

2.2.3. The overlap

Theorem 2.5**.**

2.2.4. The teacher-student model

Theorem 2.6**.**

Corollary 2.7**.**

2.2.5. Reconstruction

Theorem 2.8**.**

Theorem 2.9**.**

Corollary 2.10**.**

2.3. Examples

2.3.1. The Potts antiferromagnet

Fact 2.11** ([23, Lemma 4.1]).**

Lemma 2.12**.**

Proof.

2.3.2. The stochastic block model

Lemma 2.13** ([23, Lemma 4.4]).**

2.3.3. The kkk-spin model

Fact 2.14**.**

Lemma 2.15**.**

Proof.

Proof of Theorem 1.1.

Proof of Theorem 1.2.

2.4. Discussion and related work

3. Proof strategy

3.1. Two moments do not suffice

3.2. The condensation phase transition and the overlap

Lemma 3.1**.**

Theorem 3.2**.**

Proposition 3.3**.**

Corollary 3.4**.**

Proof.

3.3. The Kesten-Stigum bound

Lemma 3.5**.**

Lemma 3.6**.**

Proposition 3.7**.**

Proof of Theorem 2.3.

3.4. Second moment redux

Proposition 3.8**.**

Proposition 3.9**.**

3.5. Virtuous cycles

Fact 3.10** ([19]).**

Proposition 3.11**.**

Proposition 3.12**.**

3.6. Small subgraph conditioning

Lemma 3.13**.**

Proof of Theorem 2.4.

Organization

3.7. Proof of Lemma 3.13

Lemma 3.14**.**

Proof.

Proof of Lemma 3.13.

4. Getting started

1.2. The diluted $k$ -spin model

Theorem 1.1.

Theorem 1.2.

Theorem 1.3.

Theorem 1.4.

Corollary 1.5.

Theorem 1.6.

Definition 2.1.

Theorem 2.2.

Theorem 2.3.

Theorem 2.4.

Theorem 2.5.

Theorem 2.6.

Corollary 2.7.

Theorem 2.8.

Theorem 2.9.

Corollary 2.10.

Fact 2.11 ([23, Lemma 4.1]).

Lemma 2.12.

Lemma 2.13 ([23, Lemma 4.4]).

2.3.3. The $k$ -spin model

Fact 2.14.

Lemma 2.15.

Lemma 3.1.

Theorem 3.2.

Proposition 3.3.

Corollary 3.4.

Lemma 3.5.

Lemma 3.6.

Proposition 3.7.

Proposition 3.8.

Proposition 3.9.

Fact 3.10 ([19]).

Proposition 3.11.

Proposition 3.12.

Lemma 3.13.

Lemma 3.14.

Lemma 4.1 ([13]).

Lemma 4.2 ([23, Lemma 4.7]).

Fact 4.3.

Lemma 4.4 ([23, Proposition 3.10]).

Lemma 4.5.

Lemma 4.6.

Corollary 4.7.

Corollary 4.8.

Corollary 4.9.

Lemma 5.1.

Lemma 5.2.

Lemma 5.3.

Lemma 5.4.

Remark 5.5.

Fact 5.6 (Faà di Bruno’s formula).

Claim 5.7.

Claim 5.8.

Claim 5.9.

Claim 5.10.

Lemma 6.1.

Lemma 6.2.

Corollary 6.3.

Corollary 6.4.

Lemma 6.5.

Lemma 6.6.

Corollary 6.7.

Proposition 7.1.