Charting the replica symmetric phase
Amin Coja-Oghlan, Charilaos Efthymiou, Nor Jaafari, Mihyun Kang,, Tobias Kapetanopoulos

TL;DR
This paper rigorously confirms the physicists' predictions about the replica symmetric phase in diluted mean-field models, including models like Potts antiferromagnet, k-XORSAT, and stochastic block models, clarifying phase transitions and detection thresholds.
Contribution
It provides a rigorous mathematical validation of the replica symmetric phase and phase transition predictions for a broad class of diluted mean-field models, previously based on non-rigorous methods.
Findings
Confirmed the existence of a replica symmetry breaking phase transition.
Validated the detailed evolution of the Gibbs measure within the replica symmetric phase.
Proved a conjecture on the detection problem in the stochastic block model.
Abstract
Diluted mean-field models are spin systems whose geometry of interactions is induced by a sparse random graph or hypergraph. Such models play an eminent role in the statistical mechanics of disordered systems as well as in combinatorics and computer science. In a path-breaking paper based on the non-rigorous `cavity method', physicists predicted not only the existence of a replica symmetry breaking phase transition in such models but also sketched a detailed picture of the evolution of the Gibbs measure within the replica symmetric phase and its impact on important problems in combinatorics, computer science and physics [Krzakala et al.: PNAS 2007]. In this paper we rigorise this picture completely for a broad class of models, encompassing the Potts antiferromagnet on the random graph, the -XORSAT model and the diluted -spin model for even . We also prove a conjecture about the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Charting the replica symmetric phase
Amin Coja-Oghlan*∗, Charilaos Efthymiou∗∗, Nor Jaafari, Mihyun Kang∗∗∗, Tobias Kapetanopoulos∗∗∗∗*
Amin Coja-Oghlan, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.
Charilaos Efthymiou, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.
Nor Jaafari, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.
Mihyun Kang, [email protected], Technische Universität Graz, Institute of Discrete Mathematics, Steyrergasse 30, 8010 Graz, Austria
Tobias Kapetanopoulos, [email protected], Goethe University, Mathematics Institute, 10 Robert Mayer St, Frankfurt 60325, Germany.
Abstract.
Diluted mean-field models are spin systems whose geometry of interactions is induced by a sparse random graph or hypergraph. Such models play an eminent role in the statistical mechanics of disordered systems as well as in combinatorics and computer science. In a path-breaking paper based on the non-rigorous ‘cavity method’, physicists predicted not only the existence of a replica symmetry breaking phase transition in such models but also sketched a detailed picture of the evolution of the Gibbs measure within the replica symmetric phase and its impact on important problems in combinatorics, computer science and physics [Krzakala et al.: PNAS 2007]. In this paper we rigorise this picture completely for a broad class of models, encompassing the Potts antiferromagnet on the random graph, the -XORSAT model and the diluted -spin model for even . We also prove a conjecture about the detection problem in the stochastic block model that has received considerable attention [Decelle et al.: Phys. Rev. E 2011].
*∗*The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Grant Agreement n. 278857–PTCC
∗∗ Supported by DFG grant EF 103/1-1
*∗∗∗*Supported by Austrian Science Fund (FWF): P26826.
*∗∗∗∗*Supported by Stiftung Polytechnische Gesellschaft PhD grant
1. Introduction
1.1. The cavity method
Contrasting the awe-inspiring arsenal of techniques at the disposal of modern combinatorics and probability with the utter simplicity of terms in which, say, the Erdős-Rényi random graph model is defined, one might expect that after a half-century of study everything ought to be known about this and alike models. Yet beneath the surface lurks a picture of mesmerizing complexity. Its unexpected intricacy was brought out most clearly by a line of research that commenced in the statistical physics community with the study of diluted mean-field models, spin systems whose geometry of interactions is induced by a sparse random graph or hypergraph. Such models were put forward in physics as models of disordered systems [47]. Prominent examples include the diluted -spin model or the Potts antiferromagnet on a random graph [25, 37, 60]. The graph structure, convergent locally to the Bethe lattice or a Galton-Watson tree, induces a non-trivial metric, which is why such models have been argued to evince a closer semblance of physical reality than fully connected ones such as the Sherrington-Kirkpatrick model [48, 50]. But perhaps even more importantly, apart from and beyond the disordered systems thread, in the course of the past half-century models based on random graphs have come to play a role in combinatorics, probability, statistics and computer science that can hardly be overstated. For example, the random -SAT model is of fundamental interest in computer science [9], the stochastic block model has gained prominence in statistics [1, 38, 56], low-density parity check codes are the bread and butter of modern coding theory [63], and problems such as random graph coloring have been the lodestars of probabilistic combinatorics ever since the days of Erdős and Rényi [9, 21, 29].
In the course of the past 20 years physicists developed an analytic but non-rigorous technique for the study of such models called the ‘cavity method’. It has been brought to bear on all of the aforementioned and very many other models in an impressive and ongoing line of work that has led to numerous predictions that impact on an astounding variety of problems (e.g., [26, 47, 51, 67]). The task of putting the cavity method on a rigorous foundation has therefore gained substantial importance, and despite recent successes (e.g., [23, 28, 35, 56]) much remains to be done. In particular, while the cavity method can be applied to a given model almost mechanically, most rigorous arguments are still based on ad hoc, model-specific delibarations. This leads to the question of whether we can come up with abstract arguments that rigorise the cavity method wholesale, which is the thrust of the present paper.
One of the most important predictions of the cavity method is that the Gibbs measures induced by random graph models undergo a replica symmetry breaking or condensation phase transition [43]. Physically this phase transition resembles the Kauzmann transition from the study of glasses [40]. The fact that a phase transition occurs at the location predicted by the cavity method was recently proved for a fairly broad family of models [23]. However, that result fell short of establishing that the condensation phase transition does indeed mark the point where the nature of correlations under the Gibbs measure changes as predicted by the cavity method.
Here we prove that this is indeed the case. In fact, we rigorise the entire “map” of the replica symmetric phase as predicted in [32, 43, 44], including its boundary, the evolution of the nature of correlations within and an important contiguity result. More specifically, first and foremost we prove that the condensation phase transition does indeed separate a “replica symmetric” phase without extensive long-range correlations from a phase where long-range correlations prevail, arguably the key feature of the physics picture. Further, we verify the physics prediction on the threshold for the onset of point-to-set correlations, called the reconstruction threshold. Additionally, we derive the precise limiting distribution of the free energy within the replica symmetric phase, thereby vindicating a prediction that the free energy exhibits remarkably small fluctuations [32, 44]. Finally, verifying a prominent prediction from [26], we prove a contiguity statement that has an impact on statistical inference problems such as the stochastic block model.
The results of this paper cover a wide class of random graph models, even broader than the family of models for which the condensation threshold was previously derived in [23]. Indeed, as a testimony to the power of the present general approach we may point out that even the specializations of the main results to prominent examples such as the Potts antiferromagnet on the random graph or the -spin model were not previously known, even though these models received considerable attention in their own right. Before presenting the general results in Section 2, we illustrate their impact on three important examples: the diluted -spin model, the Potts antiferromagnet on the random graph and the stochastic block model.
1.2. The diluted -spin model
For integers , and a real let be the random -uniform hypergraph on whose edge set is obtained by including each of the possible -subsets of with probability independently. Additionally, let be a family of independent standard Gaussians. The -spin model on at inverse temperature is the distribution on the set defined by
[TABLE]
Arguably the most interesting and at the same time most challenging scenario arises in the case of a sparse random hypergraph [48]. Specifically, set for a fixed so that in the limit the average vertex degree of converges to in probability. How does the model change as we vary ?
According to the physics predictions for any , there exists a condensation threshold where the function is non-analytic [33]. This conjecture was proved in the case by Guerra and Toninelli [37]. However, their technique does not give the precise condensation phase transition for [37, Section 9], nor does he -spin model belong to the class of models for which the condensation threshold was determined in [23]. The following theorem pinpoints the precise condensation threshold for all , proving the prediction from [33].
As is the case of most results inspired by the cavity method, the precise value comes in terms of a stochastic optimization problem. Specifically, write for the set of all probability distributions on a finite set and identify with the standard simplex in . Moreover, let be the space of all probability measures on and let be the space of all whose barycenter is the uniform distribution on . Further, let .
Theorem 1.1**.**
Suppose that and that . Let be a Poisson variable with mean , let be standard Gaussians and for let be random variables with distribution , all mutually independent. Define
[TABLE]
and Then and
[TABLE]
From now on we assume that is even. The regime is called the replica symmetric phase. According to the cavity method, its key feature is that with probability tending to in the limit , two independent samples (‘replicas’) chosen from the Gibbs measure are “essentially perpendicular”. To formalize this define for the overlap as We write for the average on chosen independently from and denote the expectation over the choice of and by .
Theorem 1.2**.**
For all and even we have
The corresponding statement for was proved by Guerra and Toninelli, but as they point out their argument does not extend to larger [37].
Theorem 1.2 implies the absence of extensive long-range correlations in the replica symmetric phase. Indeed, for two vertices and let
[TABLE]
be the joint distribution of the spins assigned to . Further, let be the uniform distribution on . Then the total variation distance is a measure of how correlated the spins of are. Indeed, in the case that is even for every the Gibbs marginals satisfy because for every . Therefore, if the spins at were independent, then . Furthermore, it is well known (e.g., [13, Section 2]) that
[TABLE]
Thus, Theorem 1.2 implies that for , with probability tending to , the spins assigned to two random vertices of are asymptotically independent. By contrast, Theorem 1.2 and (1.2) show that extensive long-range dependencies occur beyond but arbitrarily close to .
1.3. The Potts antiferromagnet
Let be an integer, let be a set of “colors” and let . The antiferromagnetic -spin Potts model on a graph at inverse temperature is the probability distribution on defined by
[TABLE]
The Potts model on the random graph with vertex set whose edge set is obtained by including each of the possible pairs , , , with probability independently, has received considerable attention (e.g. [12, 22, 25]). As in the -spin model, the most challenging case is that for a fixed real , so that the average degree converges to in probability.
The condensation phase transition in this model was pinpointed recently [23]. As in the -spin model, the answer comes as a stochastic optimization problem. To be precise, let be a -random variable, let denote samples from , mutually independent and independent of , and set
[TABLE]
Then [23, Theorem 1.1] shows that and
[TABLE]
While it may be difficult to calculate numerically, there is the explicit Kesten-Stigum bound [3]
[TABLE]
which is known to be tight for for all [45, 58, 59], conjectured to be tight for for all [26, 46], and known not to be tight for [66].
What can we say about the nature of the Gibbs measure in the ‘replica symmetric phase’ ? Azuma’s inequality shows that converges to in probability, i.e., the free energy has fluctuations of order . On the other hand, given that key parameters such as the size of the largest connected component of exhibit fluctuations of order even once we condition on the number of edges, one might expect that so does . Yet remarkably, the following theorem shows that throughout the replica symmetric phase the free energy merely has bounded fluctuations given . In fact, we know the precise limiting distribution.
Theorem 1.3**.**
Let , and . With a sequence of independent Poisson variables with mean , let
[TABLE]
Then and, in distribution,
[TABLE]
Further, as in the -spin model the replica symmetric phase can be characterized in terms of the overlap. Formally, define the overlap of two colorings as the probability distribution on where is the probability that a random vertex is colored under and under . Let denote the uniform distribution on , write for two independent samples from , denote the expectation with respect to by and the expectation over the choice of by .
Theorem 1.4**.**
For all we have
As in the case of the -spin model it is easy to see that iff the colors assigned to two randomly chosen vertices of are asymptotically independent with probability tending to one. Hence, marks the onset of long-range correlations.
In many diluted models, and in particular in the Potts antiferromagnet, the condensation transition is conjectured to be preceded by another threshold where certain “point-to-set correlations” emerge [43]. Intuitively, the reconstruction threshold is the point from where for a random vertex correlations between the color assigned to and the colors assigned to all vertices at a large enough distance from persist. Formally, with chosen from let be the -algebra on generated by the random variables , where ranges over all vertices at distance at least from . Then
[TABLE]
measures the extent of correlations between and a random boundary condition in the limit (the outer limit exists due to mononicity). Indeed, with the expectation in (1.8) referring to the choice of , the outer chooses a random coloring of the vertices at distance at least from and the inner averages over the color of given the boundary condition.
The reconstruction threshold is defined as A priori, calculating appears to be rather challenging because we seem to have to control the joint distribution of the colors at distance from . However, according to physics predictions is identical to the corresponding threshold on a random tree [43], a conceptually much simpler object. Formally, let be the Galton-Watson tree with offspring distribution . Let be its root and for an integer let be the finite tree obtained by deleting all vertices at distance greater than from . Then
[TABLE]
measures the extent of correlations between the color of the root and the colors at the boundary of the tree. Accordingly, the tree reconstruction threshold is defined as Combining Theorem 1.4 with a result of Gerschenfeld and Montanari [34], we obtain the following result.
Corollary 1.5**.**
For every and we have
Previously it was known that for exceeding some (large but) undetermined constant [55]. This assumption was required because the proof depended on model-specific combinatorial considerations. A merit of the present approach is that we replace such combinatorial arguments by abstract probabilistic ones.
1.4. The stochastic block model
The disassortative stochastic block model, originally introduced by Holland, Laskey, and Leinhardt [38], is an intensely studied statistical inference problem associated with the Potts model [56]. We first choose a random coloring of vertices with colors. Then, setting
[TABLE]
we generate a random graph by connecting any two vertices of the same color with probability and any two with distinct colors with probability independently. Thus, the average degree of converges to in probability.
Two fundamental statistical problems arise [26]. First, given , for what values of is it possible to recover a non-trivial approximation of given just the random graph , i.e., to do better than just a random guess (see [26] for a formal definition)? A second, more modest task is the detection problem, which merely asks whether the random graph chosen from the stochastic block can be told model apart from the natural “null model”, namely the plain Erdős-Rényi random graph .
Decelle, Krzakala, Moore and Zdeborová [26] predicted that for , i.e., below the Potts condensation threshold (1.5), it is information-theoretically impossible to solve either problem. That is, there is no test or algorithm that can infer with probability tending to as whether its input was created via the stochastic block model or the Erdős-Rényi model, let alone obtain a non-trivial approximation to . On the other hand, they predicted that there exist efficient algorithms to solve either problem if exceeds the Kesten-Stigum bound (1.7). Both of these conjectures were proved in the case by Mossel, Neeman and Sly [58, 59] and Massoulié [45]. After advances by Bordanve, Lelarge and Massoulié [20], the positive algorithmic conjecture was proved in full by Abbe and Sandon [3]. On the negative side, [23, Theorem 1.3] shows that no algorithm can infer a non-trivial approximation to if for any , . Additionally, Banks, Moore, Neeman, and Netrapalli [12] employed a second moment argument based on Achlioptas and Naor [8] to determine an explicit range of where it is impossible to discern whether the graph was created via the stochastic block model or the Erdős-Rényi model. However, there has remained an extensive gap between their explicit bound and the actual condensation threshold.
Our next result closes this gap and thus settles the conjecture from [26]. Recall that the random graph models are mutually contiguous for if for any sequence of events we have
[TABLE]
If so, then clearly no algorithm (efficient or not) can discern with probability whether a given graph stems from the stochastic block model or the “null model” .
Theorem 1.6**.**
For all , , the random graph models and are mutually contiguous.
This result is tight since [23, Theorem 2.6] implies that fail to be mutually contiguous for .
Theorem 1.6 deals with the disassortative version of the block model, which corresponds to the Potts antiferromagnet. There is a contiguity conjecture in [26] for the assortative (viz. ferromagnetic) version as well, and Banks, Moore, Neeman, and Netrapalli [12] obtained upper and lower bounds in that case too, but the techniques of the present work do not apply to ferromagnetic models (see Section 2.4).
2. Main results
Factor graph models have emerged as a unifying framework for a multitude of concrete models arising in physics, combinatorics, and other disciplines [47, 63]. The main results of this paper, which we present in this section, therefore deal with a general class of random factor graph models, subject merely to a few easy-to-check assumptions. In Section 2.1 we define this general notion. Then we state the results for general random factor graph models in Section 2.2. Moreover, in Section 2.3 we indicate how the diluted -spin model, the Potts antiferromagnet and the stochastic block model fit this framework. Section 2.4 contains a discussion of related work.
2.1. Factor graphs
The following definition encompasses most important examples of spin systems on graphs [47].
Definition 2.1**.**
Let be a finite set of spins, let be an integer and let be a set of functions that we call weight functions. A -factor graph consists of
- •
a finite set of variable nodes,
- •
a finite set of constraint nodes,
- •
an ordered -tuple for each ,
- •
a family of weight functions.
The Gibbs distribution of is the probability distribution on defined by for , where
[TABLE]
A -factor graph induces a bipartite graph with vertex sets and where is adjacent to . We shall therefore use common graph-theoretic terminology and refer to, e.g., the vertices as the neighbors of . Furthermore, the length of shortest paths in the bipartite graph induces a metric on the nodes of .
Diluted mean-field models correspond to random factor graphs. To define them formally, we observe that any weight function can be viewed as a point in -dimensional Euclidean space. We thus endow the set of all possible weight functions with the -algebra induced by the Borel algebra. Further, for a weight function and a permutation we define , . Throughout the paper we assume that is a measurable set of weight functions such that for all and all permutations we have . Moreover, we fix a probability distribution on . We always denote by an element of chosen from , and we set
[TABLE]
Furthermore, we always assume that is such that the following three inequalities hold:
[TABLE]
The first two inequalities bound the ‘tails’ of for . The third one provides that is non-constant.
With these conventions in mind suppose that are integers. Then we define a random -factor graph as follows. The set of variable nodes is , the set of constraint nodes is and the neighborhoods are chosen uniformly and independently for . Furthermore, the weight functions are chosen from the distribution mutually independently and independently of the neighborhoods . Where is apparent we just write rather than .
Since we aim to study models on sparse random graphs such as the Potts model on the Erdős-Rényi graph we are concerned with the case that as . To express this elegantly and in order to be able to take the thermodynamic limit easily, we fix a real that does not depend on , let have distribution and write for brevity. Then the expected degree of a variable node is equal to .
While in the neighborhoods are chosen uniformly, in order to accommodate certain applications such as the Potts model on the Erdős-Rényi graph we need to impose two conditions. First, that for any constraint node the neighboring variable nodes are distinct. Second, that for all . Let us denote the event that these two conditions hold by . Combinatorially is the event that the hypergraph whose vertices are the variable nodes and whose edges are the neighborhoods of the contraint nodes is simple and -uniform. We are going to state all results both for the unconstraint and conditional on .
Apart from the condition (2.1), which we assume tacitly, the main results require (some of) the following four assumptions. Crucially, they only refer to the distribution on the set of weight functions.
**SYM: **
For all , and we have
[TABLE]
and for every permutation and every measurable we have .
**BAL: **
The function
[TABLE]
is concave and attains its maximum at the uniform distribution on .
**MIN: **
Let be the set of all probability distribution on such that for all . The function
[TABLE]
has the uniform distribution on as its unique global minimizer.
**POS: **
For all the following is true. With chosen from , chosen from and chosen from , all mutually independent, we have
[TABLE]
Conditions very similar to SYM, BAL and POS appeared in [23] as well. SYM is a symmetry condition.In the language of the cavity method [47], the condition ensures that the unique Belief Propagation fixed point on any acyclic -factor graph is such that all messages are identical to the uniform distribution on (but we will not need this fact explicitly).111The condition (2.2) emerged out of a discussion with Guilhem Semerjian. Condition BAL is going to guarantee that for small enough values of the Gibbs measure is typically concentrated on “balanced” , i.e., for all . Further, MIN is a technical condition that we need in order to study the overlap of two independent Gibbs samples. Finally, POS is required so that we can apply certain results from [23]. As we shall see in Section 2.3, the conditions are easily verified in the models from Section 1 and several others.
2.2. Results
We proceed to state the results on the condensation phase transition, the limiting distribution of the free energy, the overlap, the reconstruction and the detection thresholds for random factor graph models.
2.2.1. The condensation phase transition
The following theorem pins down the condensation phase transition in random factor graph models precisely in terms of a stochastic optimization problem that encodes the “1RSB cavity equations with Parisi parameter ” from the cavity method [47].
Theorem 2.2**.**
Assume that satisfies SYM, BAL and POS and let . With a -random variable, chosen from and chosen from , all mutually independent, let
[TABLE]
Then and
[TABLE]
Theorem 2.2 generalizes [23, Theorem 2.7], which requires that the set of weight functions be finite.
Admittedly the formula for provided by Theorem 2.2 is neither very simple nor very explicit, but we are not aware of any reason why it ought to be. Yet there is a natural generalization of the Kesten-Stigum bound for the Potts model from (1.7) that provides an easy-to-compute upper bound on in terms of the spectrum of a certain linear operator. The operator is constructed as follows. For let be the matrix with entries
[TABLE]
and let be the linear operator on the -dimensional space defined by
[TABLE]
Further, with denoting the vector with all entries equal to one, let
[TABLE]
Finally, we introduce
[TABLE]
with the convention that if .
Theorem 2.3**.**
If satisfies SYM and BAL, then .
We shall see in Section 3 that is related to the “broadcasting matrix” of a suitable Galton-Watson tree, which justifies referring to as a generalized version of the classical Kesten-Stigum bound from [41]. While the Kesten-Stigum bound is not generally tight, it plays a major conceptual role, as will emerge in due course.
2.2.2. The free energy
Theorem 2.2 easily implies that converges to in probability if . Yet due to the scaling factor of this is but a rough first order approximation. The next theorem, arguably the principal achievement of this paper, yields the exact limiting distribution of the unscaled free energy in the entire replica symmetric phase. Recalling (2.5), we introduce the -matrix
[TABLE]
Also recall that denotes the number of constraint nodes of and let be the spectrum of .
Theorem 2.4**.**
Assume that satisfies SYM, BAL, POS and MIN and that . Let be a family of Poisson variables with means and let be a sequence of samples from , all mutually independent. Then the random variable
[TABLE]
satisfies and
[TABLE]
in distribution. Further, given the random variable on the left hand side of (2.11) converges in distribution to
[TABLE]
which also satisfies .
Since key parameters of the random factor graph such as the size of the largest connected component of exhibit fluctuations of order even once we condition on , one might a priori expect that the same is true of the free energy . However, (2.11) shows that given the free energy has bounded fluctuations.
2.2.3. The overlap
For we define the overlap by letting
[TABLE]
Let be the uniform distribution on . The following theorem confirms one of the core tenets of the cavity method, namely the absence of extensive long-range correlations for . We write for two independent samples chosen from the Gibbs measure , for the expectation with respect to the and for the expectation with respect to the choice of .
Theorem 2.5**.**
If satisfies SYM, BAL, POS and MIN, then
[TABLE]
If we let be the Gibbs marginal of and the joint distribution of the spins at , then Theorem 2.5 implies together with standard arguments that
[TABLE]
In other words, for with probability tending to as , the spins assigned to two randomly chosen variable nodes are asymptotically independent.
Conversely, Theorem 2.5 shows that for any there exists such that
[TABLE]
Hence, if we know that the Gibbs marginals are uniform (e.g., due to the symmetry among colors in the Potts model or the inversion symmetry in the -spin model for even ), then (2.12) becomes
[TABLE]
Since two randomly chosen variable nodes of have distance with probability , (2.13) states that long range correlations persist for beyond but arbitrarily close to .
2.2.4. The teacher-student model
Finally, there is a natural statistical inference version of the random factor graph model, the teacher-student model [67], a generalization of the stochastic block model from Section 1.4. Suppose that is an assignment of spins to variable nodes. Then we introduce a random factor graph with variable nodes and constraint nodes such that, independently for each , the neighborhood and the weight function are chosen from the following joint distribution: for any and for any measurable ,
[TABLE]
Thus, the probability of the outcome is the ‘prior’ probability of selecting times the ‘posterior’ weight .
Further, given consider the following experiment where the initial assignment is chosen randomly as well.
**TCH1: **
an assignment , the ground truth, is chosen uniformly at random.
**TCH2: **
independently of , draw from the Poisson distribution with mean .
**TCH3: **
generate .
The intuition behind this model is that a “teacher”, in possession of the ground truth , finds herself unable to communicate to a student directly. Instead the teacher utilizes to set up a random factor graph that the student gets to observe. Given the student aims to recover as best as possible. As in the case of the stochastic block model, two natural questions arise: given , is it information-theoretically possible to accomplish a better approximation to than a mere independent random guess? More modestly, there is the detection problem: given a factor graph is it possible to discern with probability as whether was chosen from the model or from the “null model” ? As the imprint that the ground truth imbues on increases with , we should expect the existence of a threshold from where either problem turns solvable. Regarding the detection problem, we recall that the random graph models are mutually contiguous if for any sequence of events we have iff . The following theorem establishes a generalization of the conjectures put forward in [26] for the stochastic block model to the case of random factor graph models.
Theorem 2.6**.**
If satisfies SYM, BAL, POS and MIN, then are mutually contiguous for all , while fail to be mutually contiguous for . The same holds given .
Previously it was known that for it is impossible to recover an assignment that has a strictly greater overlap with [23, Theorem 2.6]. Theorem 2.6 shows that, in fact, marks the threshold for the feasibility of the humble detection problem.
While Theorem 2.6 is bad news from a statistical inference point of view, the upshot is that throughout the replica symmetric phase typical properties of Gibbs samples of can be investigated accurately by way of the teacher-student model , a technique known as “quiet planting” [4, 42]. This idea has been used critically in rigorous work on specific examples of random factor graph models, e.g., [54]. Formally, quiet planting applies if the factor graph/assignment pair comprising the ground truth and the outcome of TCH1–TCH3 and the pair consisting of the random factor graph and a Gibbs sample of are mutually contiguous. Previously this was known to be true for a few specific models (e.g., [16, 22]), albeit not generally in the entire replica symmetric phase. The following corollary to Theorem 2.6 shows that “quiet planting” is a universal phenomenon.
Corollary 2.7**.**
Assume that satisfies SYM, BAL, POS and MIN. For all the pairs and are mutually contiguous. The same is true given .
2.2.5. Reconstruction
According to the physics deliberations the condensation phase transition is generally preceded by another threshold where certain point-to-set correlations emerge, the reconstruction threshold [43]. Reconstruction plays a major role in the cavity formalism because it provides the conceptual underpinning for the notion that the Gibbs measure decomposes into a multitude of “clusters” [47, 51]. Formally, suppose that is a factor graph with variable nodes , and that . Let be the -algebra on generated by the random variables such that is a variable node whose distance from in is at least . Further, define
[TABLE]
Of course, the expectation refers to the choice of , the outer expectation averages over the “boundary condition”, i.e., the spins of the variable nodes at distance at least from , and the inner is the conditional expectation given the boundary condition. If , then the influence of a “typical" boundary condition on the spin of decays with the radius . Thus, the reconstruction threshold is the smallest degree where the influence of the boundary persists.
A priori determining appears to be challenging because the joint distribution of the spins at distance from is determined not merely by the “local” effects within the radius- neighborhood of but also by the graph beyond. But according to physics predictions (e.g., [43]), actually is equal to the corresponding threshold on a suitable Galton-Watson tree. Conceptually this amounts to an enormous simplification because the branches of the tree are mutually dependent only through their being connected to the root, a situation amenable to precise treatment via the Belief Propagation message passing scheme [47].
Formally, we introduce a multi-type Galton-Watson tree that mimics the local geometry of . The types are either variable nodes or constraint nodes, each of the latter endowed with a weight function . The root of the Galton-Watson tree is a variable node . The offspring of a variable node is a number of constraint nodes whose weight functions are chosen from independently. Moreover, the offspring of a constraint node is variable nodes. For an integer we let denote the (finite) tree obtained from by deleting all variable or constraint nodes at distance greater than from . In analogy to (2.15) we set
[TABLE]
The tree reconstruction threshold is defined as .
Theorem 2.8**.**
Suppose that satisfies SYM, BAL, POS and MIN. Then and for all . Moreover,
[TABLE]
We prove Theorem 2.8 by way of the teacher-student model and the “quiet planting” result Corollary 2.7. This argument provides a perspective on the reconstruction problem that has an impact on the statistical inference questions as well. Specifically, we observe that the reconstruction problem on the random tree is equivalent to a natural “Bayesian” reconstruction problem in the teacher-student model. Formally, let be the -algebra generated by the graph and the random variables with at distance at least from . Then
[TABLE]
measures the correlation between , the spin at under the ground truth, and the spins that assigns to the variables at distance at least . The proof of Theorem 2.8 is based on showing that for all .
Theorem 2.9**.**
If satisfies SYM, BAL, POS and MIN, then for all we have
[TABLE]
Finally, we highlight an immediate but interesting consequence of Theorems 2.3 and 2.8 that generalizes the classical Kesten-Stigum upper bound for reconstruction on trees [41].
Corollary 2.10**.**
If satisfies SYM, BAL, POS and MIN, then for all .
The reconstruction problem on a certain class of random factor graph models (that includes, e.g., the Potts antiferromagnet) was previously studied by Gerschenfeld and Montanari [34]. They observed that overlap concentration about as provided by Theorem 2.5 for guarantees that the reconstruction thresholds and coincide. Subsequently, with the condensation threshold well out of reach at the time, Montanari, Restrepo and Tetali [55] attempted to verify the required overlap concentration at least for all up to the tree reconstruction threshold. However, their combinatorial (essentially second moment) argument did not cover the entire range of parameters, e.g., all and/or all in the Potts model. By comparison to [34, 55], Theorem 2.9 provides a different, perhaps more conceptual angle: tree reconstruction is equivalent to reconstruction in the teacher-student model for all , and up to the equivalence extends to the random factor graph model thanks to contiguity.
2.3. Examples
Here we show how the models from Section 1 can be cast as random factor graph models that satisfy the assumptions SYM, BAL, POS and MIN.
2.3.1. The Potts antiferromagnet
For an integer and a real we let and
[TABLE]
Let be the singleton . Then the Potts model on a given graph can be cast as a -factor graph: we just set up the factor graph whose variable nodes are the vertices of the original graph and whose constraint nodes are the edges of . For an edge we let , where, say, the order of the neighbors is chosen randomly, and , of course. Then coincides with from (1.3).
To mimic the Potts model on the Erdős-Rényi graph we let be the atom on . Then the sole difference between the factor graph representation of the Erdős-Rényi graph and is that the latter may have factor nodes such that (“self-loops”) or pairs of distinct factor nodes such that (“double-edges”). However, conditioning on the event rules out self-loops and double-edges. Indeed, we have the following.
Fact 2.11** ([23, Lemma 4.1]).**
The random factor graph and given are mutually contiguous.
Lemma 2.12**.**
The assumptions SYM, BAL, POS and MIN hold for for all and all .
Proof.
That SYM, BAL and POS hold is known already [23, Lemma 4.3]. With respect to MIN, we observe that for any distribution on with uniform marginals,
[TABLE]
The last expression is strictly convex as a function of with the minimum attained at the uniform distribution. ∎
Thus the results stated in Section 1.3 follow from the results for general random factor graph models. Indeed, to obtain Theorem 1.3 we observe that the matrices from (2.5), (2.6) and (2.9) satisfy
[TABLE]
where is the all-ones matrix and is the identity matrix. Clearly, the eigenvalues of are and , the latter with multiplicity . Hence,
[TABLE]
Thus, Theorem 1.3 follows from Theorem 2.4 and Theorem 1.4 from Theorem 2.5. Finally, (2.19) shows that and thus (2.8) matches the “classical” Kesten-Stigum bound (1.7).
2.3.2. The stochastic block model
The teacher-student model corresponding to is very similar to the stochastic block model. As in the case of the Potts model on the Erdős-Rényi graph, the only discrepancy is due to the possible occurrence of self-loops and double-edges.
Lemma 2.13** ([23, Lemma 4.4]).**
For any , , the stochastic block model and the teacher-student model given are mutually contiguous.
Theorem 1.6 follows from Theorem 2.6 and Lemma 2.13.
2.3.3. The -spin model
Let . For we could define the weight function to match the definition (1.1) of the -spin model. However, these functions do not necessarily take values in . To remedy this problem we introduce . Then (cf. [60])
[TABLE]
Thus, let , let , where is a standard Gaussian and let be the law of . Similarly as in the case of the Potts model we have the following.
Fact 2.14**.**
For all the random measure from (1.1) and the Gibbs measure of the random factor graph given are mutually contiguous. Furthermore,
[TABLE]
Instead of just verifying the conditions SYM, BAL, POS and MIN for the -spin model with standard Gaussian couplings , we will establish the following more general statement. Recall that a random variable is symmetric if and have the same distribution.
Lemma 2.15**.**
For any , and for any symmetric random variable such that satisfies (2.1) the three conditions SYM, BAL and POS hold. If is even, then MIN holds as well .
Proof.
It is immediate that and that satisfies SYM. For BAL observe that is constant because is symmetric. To verify POS we generalize the argument from [23, Section 4.4] by observing that for any integer , with the notation from POS,
[TABLE]
Hence, expanding and using (2.1) and Fubini’s theorem to swap the sum and the expectation, we find
[TABLE]
Applying similarly manipulations to the other two terms from POS and introducing , , we see that POS comes down to showing that
[TABLE]
Since is symmetric we get for odd , while and for even . Hence, (2.21) follows from the elementary fact that for all .
Moving on to MIN, we assume that is even. Suppose that is a distribution on with uniform marginals and let . Then , and because is symmetric,
[TABLE]
Because is even, the last expression is convex with the minimum attained at , viz. . ∎
Lemma 2.15 shows not only that the -spin model from Section 1.2 with a standard Gaussian satisfies SYM, BAL,POS and MIN, but that the same is true if is the uniform distribution on . This model is known as the -XORSAT model in computer science. It is intimately related to low-density generator matrix codes [2].
Proof of Theorem 1.1.
Comparing (1.1) and (2.20), we see that
[TABLE]
Therefore, Theorem 1.1 follows from Theorem 2.2 and Lemma 2.15. ∎
Proof of Theorem 1.2.
Equations (1.1) and (2.20) ensure that the Gibbs measures and given are identically distributed. Hence, Theorem 1.2 follows from Theorem 2.5 and Lemma 2.15. ∎
2.4. Discussion and related work
The results in this section provide a map of the replica symmetric phase, its boundary and the evolution of the Gibbs measure within it, thereby vindicating for a universal class of models the predictions of the cavity method [43]. The results extend, complement or generalize prior work on the condensation phase transition from [23], which only dealt with the case that the support of is finite, and on the reconstruction problem [34, 55]. Additionally, in the example of the Potts antiferromagnet and the stochastic block model prior work based on combinatorial methods only gave approximate results [12, 22], whereas the present results are tight for all values of . Indeed, a merit of the present approach is that we perform fairly abstract arguments that do not require model-specific deliberations.
Beyond the examples treated explicitly in Section 2.3 there are several other important and well-studied models that also satisfy the assumptions of our main results. For instance, Bapst, Coja-Oghlan and Raßmann [16] obtained approximate results on the replica symmetry breaking phase transition in the random hypergraph -coloring problem. This model is easily seen to satisfy SYM, BAL, POS and MIN and thus the main results of the present paper clarify the structure of the entire replica symmetric phase. More generally, the hypergraph version of the Potts model satisfies our assumptions as well. So does the random -NAESAT model, a variant of Boolean satisfiability that resembles the hypergraph -coloring model.
Apart from proving an upper bound on the condensation threshold, the Kesten-Stigum bound plays an important role with respect to statistical inference aspects of random factor graph models. Specifically, by extension of the predictions from [26] for the stochastic block model, it seems natural to expect that there should be efficient algorithms for both the detection problem and for recovering a non-trivial approximation to the ground truth in the teacher-student model for . On the other hand, an intriguing question is whether for these two problems may be soluble in exponential time but not efficiently, i.e., in polynomial time [12, 26]. Indeed, while Theorem 2.2 shows that is always finite, there are models where , e.g., the -XORSAT model. Thus, for such models there might be an enormous computational gap. This question is intimately related to the -SAT refutation problem, an important question in computer science [30, 31].
There are a few models that fail to satisfy our assumptions. For instance, in the random -SAT model [9] and the hardcore model on the Erdős-Rényi random graph [11] condition SYM is violated. Indeed, in these two cases the Gibbs marginals are non-uniform in the replica symmetric phase. In effect, we do not expect that the free energy is as tightly concentrated as Theorem 2.4 shows it is in the case of “symmetric” models. Thus, it is not just that the present proof methods do not apply, but “asymmetric” models appear to be materially different. Moreover, ferromagnetic models generally violate SYM, BAL and POS.
A further class of models that we do not treat in this paper is models where the weight functions take values in , thus imposing hard constraints. An example of this is the “zero-temperature” version of the Potts antiferromagnet, better known as the random graph coloring problem [9]. Certain specific models with hard constraints have received considerable attention in combinatorics. For example, [17, 15, 62] established the precise condensation threshold, a contiguity result and the exact limiting distribution of the number of -colorings of the Erdős-Rényi random graph via combinatorial methods under the assumption that exceeds a large enough constant. (Subsequently the condensation threshold in the random graph coloring problem was determined for all [23].) Similar results, albeit not quite up to the precise condensation threshold, are know for the hypergraph -coloring and the -NAESAT problems [6, 7, 61], a version of the random -SAT problem with regular literal degrees [24] and the independent set problem in random regular graphs [18]. Additionally, in zero temperature models the ‘satisfiability threshold’ from where is typically equal to [math] plays a major role [5, 10, 27, 28, 36, 57].
3. Proof strategy
Throughout this section we keep the notation from Section 2.
The apex of the present work is Theorem 2.4 about the limiting distribution of the free energy; all the other results either lead up to it or derive from it relatively easily. The classical approach to proving such a result would be the second moment method, pioneered in this context by Achlioptas and Moore [6], in combination with the small subgraph conditioning technique of Robinson and Wormald [39, 64]. This strategy was applied to, e.g., the stochastic block model [12] and the -spin model [37]. But only in the stochastic block model with two colors and the diluted -spin model was it possible to obtain complete results [37, 58]. Indeed, as noticed by Guerra and Toninelli [37], a combinatorial second moment computation generally appears to be too crude a device to cover the entire replica symmetric phase.
Therefore, here we pursue a different strategy. We craft a proof around the teacher-student model . More specifically, the main achievement of the recent paper [23] was to verify the cavity formula for the leading order of the free energy in the teacher-student model (in the case that the set is finite). We will replace the second moment calculation by that free energy formula, generalized to infinite , and combine it with a suitably generalized small subgraph conditioning technique. The challenge is to integrate these two components seamlessly. We accomplish this by realizing that, remarkably, both arguments are inherently and rather elegantly tied together via the spectrum of the linear operator from (2.6). But to develop this novel approach we first need to recall the classical second moment argument and understand why it founders.
3.1. Two moments do not suffice
For any second moment calculation it is crucial to fix the number of constraint nodes because its fluctuations would otherwise boost the variance. Hence, we will work with a deterministic integer sequence . More precisely, we will fix and consider specific integer sequences is such that for all . Let be the set of all such sequences.
The second moment method rests on showing that is of the same order of magnitude as the square of the first moment. If so, then standard concentration results can be used to show that . The second limit is easy to compute because the expectation sits inside the logarithm, and thus we obtain the leading order of the free energy.
In fact, if we can calculate the second moment sufficiently accurately, then it may be possible to determine the limiting distribution of precisely. For suppose that there is a “simple” random variable such that
[TABLE]
Then the basic formula implies
[TABLE]
and typically it is not difficult to deduce from (3.2) that converges to [math] in probability. Hence, if is “reasonable enough” so that the law of is easy to express, then we have got the limiting distribution of . The basic insight behind the small subgraph conditioning technique is that (3.1) sometimes holds with a variable that is determined by the statistics of bounded-length cycles in [39, 64].
Anyhow, the crux of the entire argument is to calculate . Of course, by the linearity of expectation and the independence of the constraint nodes, the second moment can be written in terms of the overlap as
[TABLE]
Given a probability distribution on such that is integral for all , the number of assignments with equals . Therefore, Stirling’s formula yields the approximation
[TABLE]
where denotes the entropy of . In other words, computing the second moment comes down to identifying the overlap that renders the dominant contribution to (3.3). By comparison, under assumptions SYM and BAL it is not difficult to see (cf. Lemma 4.6 below) that the first moment satisfies
[TABLE]
But there are two major issues with the second moment argument. First, actually solving the innocent-looking optimization problem (3.4) turns out to be daunting even in special cases. For example, in the Potts antiferromagnet the task remains wide open, despite very serious attempts [8, 22]. The source of the trouble is that the entropy is concave while the second summand in (3.4) is convex (cf. MIN), causing a proliferation of local maxima. Second, and even worse, comparing (3.4) and (3.5) we can verify easily that the desired second moment bound can hold only if the maximizer of (3.4) satisfies . However, this is not generally true for average degrees below but near the condensation threshold. For instance, in the Potts antiferromagnet the second moment exceeds the square of the first moment by an exponential factor for below the condensation threshold [22].
The problem was noticed and partly remedied in prior work by applying the second moment method to a suitably truncated random variable (e.g. [17, 22]). This method revealed, e.g., the condensation threshold in a few special cases such as the random graph -coloring problem [17], albeit only for exceeding some (astronomical) constant , and in the random regular -SAT model for large [14]. Yet apart from introducing such extraneous conditions, ad-hoc arguments of this kind tend to require a meticulous combinatorial study of the specific model.
3.2. The condensation phase transition and the overlap
The merit of the present approach is that we avoid combinatorial deliberations altogether. Rather than bothering with the second moment bound (3.4) we will employ an asymptotic formula for the free energy of the teacher-student model . To be precise, it will be convenient to work with a slightly tweaked version of this model: following [23, Section 3], we let be the random factor graph chosen from the distribution
[TABLE]
Recalling that is a random variable with distribution , we also introduce . As before we ease the notation by dropping where possible.
Loosely speaking is a reweighted version of where the probability that comes up is proportional to . Intuitively, the construction of the teacher-student model induces a similar reweighing as the probability that depends on the number of assignments that could plausibly be used to generate via (2.14). In fact, as we shall see in Section 4 it is not difficult to verify the following.
Lemma 3.1**.**
If satisfies conditions SYM and BAL, then and are mutually contiguous for all , .
The following theorem verifies the cavity formula for the free energy of and .
Theorem 3.2**.**
Assume that satisfies SYM, BAL and POS and let . Then with from (2.3) we have
[TABLE]
Theorem 3.2 was established in [23] for the case that the set of weight functions is finite. In Section 10 we extend that results via a limiting argument to prove Theorem 3.2 for infinite . Furthermore, in Section 6 we deduce the following result from Theorem 3.2.
Proposition 3.3**.**
Assume that BAL, SYM, POS and MIN hold and that . There exists a sequence , but as , such that for all we have
[TABLE]
Proposition 3.3 resolves our second moment troubles. Indeed, the proposition enables a completely generic way of setting up a truncated second moment argument: with from Proposition 3.3 we define
[TABLE]
Hence, if “most” pairs drawn from have overlap close to , and otherwise. Proposition 3.3 shows immediately that the truncation does not diminish the first moment.
Corollary 3.4**.**
If BAL, SYM, POS and MIN hold and , then uniformly for all .
Proof.
Equation (3.6) and Proposition 3.3 yield
[TABLE]
as claimed. ∎
The second moment calculation for is easy, too. Indeed, the very construction (3.8) of guarantees that the dominant contribution to the second moment of comes from pairs with an overlap close to . Hence, computing the second moment comes down to expanding the right hand side of (3.4) around via the Laplace method. Yet in order to apply the Laplace method we need to verify that is a local maximum of the function
[TABLE]
from (3.4). For the special case of the Potts antiferromagnet the overlap concentration (3.7) was established and the second moment argument for was carried out in [23, Section 4.3]. While the generalization to random factor graph models is anything but straightforward, an even more important difference lies in the application of the Laplace method. More specifically, in the case of the Potts antiferromagnet the fact that is a local maximum of (3.9) for all was derived extremely indirectly by resorting to the statistical inference algorithm of Abbe and Sandon for the stochastic block model [3]. But of course there ought to be a general, conceptual explanation. As we shall see momentarily, there is one indeed, namely the generalized Kesten-Stigum bound.
3.3. The Kesten-Stigum bound
To see the connection, we observe that the Hessian of (3.9) at the point is equal to (with the matrix from (2.6)). Hence, taking into account that the argument is a probability distribution on , we find that is a local maximum of (3.9) if and only if
[TABLE]
In order to get a handle on the spectrum of the operator from (2.6) we begin with the following observation about the matrices and from (2.5) and (2.9).
Lemma 3.5**.**
Assume that satisfies SYM. Then the matrix is stochastic and thus for every . Moreover, is symmetric and doubly-stochastic. If, additionally, satisfies BAL, then
Proceeding to the operator , we recall the definition of the space from (2.7) and we introduce
[TABLE]
Lemma 3.6**.**
Assume that satisfies SYM and BAL. The operator is self-adjoint, and for every we have , and
[TABLE]
Furthermore, and .
Lemma 3.6 shows that induces a self-adjoint operator on the space . The following proposition yields a bound on the spectral radius of this operator. Let
[TABLE]
Proposition 3.7**.**
If satisfies SYM and BAL, then
The proof of Proposition 3.7, which is based on highlighting an inherent connection between the spectrum of and the Bethe free energy functional from (2.3), is the main technical achievement of this paper. The details can be found in Section 5. Let us observe that Theorem 2.3 is immediate from Proposition 3.7.
Proof of Theorem 2.3.
We have because Lemma 3.6 shows that is self-adjoint. Therefore, Theorem 2.3 follows from Proposition 3.7. ∎
Lemma 3.6 and Proposition 3.7 show that (3.10) is satisfied, and thus that is a local maximum of (3.9), for all . Indeed, it is immediate from (3.12) that if is of the form or for some , and Theorem 2.3 shows that for all . Hence, Proposition 3.7 provides the link between the free energy calculation for the reweighted model and the second moment of .
3.4. Second moment redux
We begin by deriving the following asymptotic formula for the first moment in Section 7. Observe that by Lemma 3.5 the set of eigenvalues of contains precisely one non-negative element, namely . Therefore, the following formula makes sense.
Proposition 3.8**.**
Suppose that satisfies SYM and BAL and let . Then uniformly for all ,
[TABLE]
Proceeding to the second moment, we recall from Lemma 3.6 that induces an endomorphism on the subspace from (3.11) and we write
[TABLE]
for the spectrum of on . Lemma 3.6 and Proposition 3.7 imply that for all . Therefore, the following formula for the second moment, whose proof we defer to Section 7, makes sense as well.
Proposition 3.9**.**
Suppose that satisfies SYM and BAL and let . Then uniformly for all ,
[TABLE]
Combining Corollary 3.4 with Propositions 3.8 and 3.9 and applying Lemma 3.6, we obtain for ,
[TABLE]
In particular, the ratio of the second moment and the square of the first is bounded as .
3.5. Virtuous cycles
In order to determine the limiting distribution of we are going to “explain” the remaining variance of in terms of the statistics of the bounded-length cycles of . However, by comparison to prior applications of the small subgraph conditioning technique, here it does not suffice to merely record how many cycles of a given length occur. We also need to take into account the specific weight functions along the cycle. Yet this approach is complicated substantially by the fact that there may be infinitely many different weight functions. To deal with this issue we are going to discretize the set of weight functions and perform a somewhat delicate limiting argument.
We need a few definitions. A signature of order is a family
[TABLE]
such that are events, and for all and if . Let be the set of all signatures of order , let and let be the set of all signatures. If is a factor graph with variable nodes and constraint nodes , then we call a family a cycle of signature in if the following conditions are satisfied.
**CYC1: **
are pairwise distinct and ,
**CYC2: **
are pairwise distinct and if ,
**CYC3: **
for all ,
**CYC4: **
for all , for all and .
Conditions CYC1– CYC2 provide that the variable nodes that the cycle passes through are pairwise distinct. Moreover, to avoid over-counting CYC1 specifies that the cycle starts at the variable node with the smallest index and CYC2 that from there the cycle is oriented towards the constraint node with the smaller index if , respectively that if . Further, CYC3 states that the weight functions along the cycle belong to . Finally, CYC4 ensures that the cycle enters the th constraint node in position and leaves in position .
Let denote the number of cycles of signature . Moreover, for an event with and define the matrix by letting
[TABLE]
In addition, for a signature define
[TABLE]
Further, two signatures , are disjoint if either , or for some , or for some . Finally, a cycle of order is a family that is a cycle of signature for some sequence , and we let signify the number of such cycles. The following is a basic fact from the theory of random graphs.
Fact 3.10** ([19]).**
Let be pairwise distinct integers and let be integers. Then for every uniformly for all we have
[TABLE]
and the expected number of pairs of cycles of order at most that share a common vertex is .
In Section 8 we establish the following enhancement that takes the weight functions along the cycles into account.
Proposition 3.11**.**
Suppose that satisfies SYM and BAL. Let be pairwise disjoint signatures and let be non-negative integers. Let . Then uniformly for all ,
[TABLE]
Moreover,
[TABLE]
Thus, for disjoint the cycle counts are asymptotically independent Poisson.
Equipped with Propositions 3.8, 3.9 and 3.11, in the case that the set of weight functions is finite we could determine the limiting distribution of and thus prove Theorem 2.4 by just applying Janson’s version of the small subgraph conditioning theorem [39]. However, to accommodate an infinite set of weight functions like in the -spin model a discretization of and a limiting argument are required. Specifically, recall that
[TABLE]
and for an integer let be the partition of induced by slicing the cube into pairwise disjoint sub-cubes of side length . Further, let denote the set of all signatures such that and such that for all , and define . Furthermore, if belongs to a sub-cube , then we let
[TABLE]
The following proposition, whose proof can be found in Section 9, establishes that the random variable from Theorem 2.4 is well-defined and that it can be approximated arbitrarily well via the discretizations .
Proposition 3.12**.**
Assume that satisfies SYM and BAL and let . Let be a family of independent Poisson variables with and let be a family of independent samples from . Furthermore, define
[TABLE]
and . Then all are uniformly bounded in the -norm, is -convergent to as and is -convergent to as . Furthermore,
[TABLE]
3.6. Small subgraph conditioning
We have all the ingredients in place to prove Theorem 2.4. Thus, fix and let . Let be the -algebra generated by the cycle counts . Following the small subgraph conditioning paradigm, we intend to show that for sufficiently large , with probability tending to as , is “close” to . Since Proposition 3.9 shows that is small and that the second moment of is under control, we are going to argue via the truncated random variable.
More specifically, to show that is “close” to with probability for sufficiently large , we are going to prove that is small. Clearly,
[TABLE]
Hence, to prove that is small it suffices to show that
[TABLE]
is nearly as big as . Given what we know at this point this is not particularly difficult. Nonetheless, let us put the details off for just a little while to Section 3.7, where we prove the following.
Lemma 3.13**.**
Suppose that satisfies SYM and BAL and let . For any there exists such that for every there exists such that for all , uniformly for all ,
[TABLE]
Proof of Theorem 2.4.
Because and by Corollary 3.4, we have Therefore, Lemma 3.13 implies that
[TABLE]
Thus, we are left to determine the law of . On this count, Proposition 3.11 shows that for any non-negative integer vector ,
[TABLE]
Hence, letting we conclude that, in distribution,
[TABLE]
Further, by (3.18)
[TABLE]
Thus, combining Propositions 3.11 and 3.12, we conclude that converges to in distribution as for every . Hence, due to (3.24) so does . Consequently, Proposition 3.12 and (3.23) show that for any bounded continuous function ,
[TABLE]
Combining these two statements and observing that the first and the last term are independent of , we obtain
[TABLE]
i.e., converges to in distribution. Plugging in the formula for the first moment from (3.14) yields (2.11). Finally, because Proposition 3.11 shows that
[TABLE]
the formula for the conditional free energy given follows from (2.11) and Lemma 3.13. ∎
Organization
The paper is organized as follows. After proving Lemma 3.13 in Section 3.7, in Section 4 we collect some preliminaries, introduce notation, supply the proofs of Lemmas 3.5 and 3.6 and show how Theorem 2.5, Theorem 2.6 and Corollary 2.7 follow from Theorem 2.4. Because we consider the proof of Proposition 3.7 the main technical achievement of this work, the proof is self-contained, and as we deem the argument rather interesting, that proof follows in Section 5. Further, Section 6 contains the proof of Proposition 3.3, which is by way of a (substantial) generalization of an argument from [23] for the Potts antiferromagnet. Subsequently Section 7 contains the proofs of Proposition 3.8 and Proposition 3.9 about the moments of the truncated variable . Moreover, Section 8 deals with the proof of Proposition 3.11. The somewhat delicate proof of Proposition 3.12 can be found in Section 9. Section 10 contains the rather technical proofs of Theorem 2.2 and Theorem 3.2. Finally, the proof of Theorem 2.8 about the reconstruction problem can be found in Section 11.
3.7. Proof of Lemma 3.13
The proof is by generalization of the argument from [24, Section 2] for the random regular -SAT model to the current setting of random factor graph models. We begin with the following lower bound on the second moment of the conditional expectation. Let .
Lemma 3.14**.**
Suppose that satisfies SYM and BAL and let , . Then uniformly for all ,
[TABLE]
Proof.
Fix a number , choose sufficiently large and let be the set of all families of non-negative integers such that . Moreover, let be the event that . Then (3.6) and Proposition 3.11 yield
[TABLE]
Let . Since the matrices are stochastic, (3.18) shows that there is a number such that . Therefore, choosing sufficiently large, we can ensure that . Hence,
[TABLE]
Combining (3.25) and (3.26), we find
[TABLE]
Finally, we need to show that can be replaced by on the l.h.s. of (3.27). Since but , we have
[TABLE]
To bound we observe that for all ,
[TABLE]
Hence, and the assertion follows from (3.27) and (3.28) by taking sufficiently slowly as . ∎
Proof of Lemma 3.13.
We use a similar trick as in the proof of [24, Corollary 2.6]. Recall that aim to show that
[TABLE]
Given choose small enough. Then by (3.21), (3.22) and Lemma 3.14 and (3.20), for sufficiently we have
[TABLE]
Now define
[TABLE]
Then
[TABLE]
Furthermore, by Chebyshev’s inequality
[TABLE]
Combining (3.30) and (3.32), we obtain
[TABLE]
Finally, (3.29) follows from (3.31), (3.33) and Markov’s inequality. ∎
4. Getting started
4.1. Basics
Throughout the paper we continue to use the notation introduced in Sections 2 and 3. In particular, we write for a set of variable nodes and for a set of constraint nodes. Further, is a random variable with distribution and we just write or if and/or are apparent. Moreover, for an integer we let .
For a finite set we denote the set of probability distributions on by . We identify with the standard simplex in and endow accordingly with the Borel -algebra. By we denote the set of probability measures on and by the set of all whose mean is the uniform distribution on . In addition, for a point in a measurable space we write for the Dirac measure on . The entropy of a probability distribution on a finite set is always denoted by . Thus, recalling that for and setting , we have
Further, if is a probability measure on the discrete cube , then denote mutually independent samples from . If is the Gibbs measure induced by a factor graph , we write etc. instead of . Where or are apparent from the context we omit the index and just write , etc. If is a random variable, then we use the notation
[TABLE]
Thus, is the mean of over independent samples from . If for a factor graph , then we simplify the notation by writing rather than . We use this notation to distinguish averages over from other sources of randomness (e.g., the choice of the random factor graph), for which we reserve the symbols and .
Finally, we need a few facts about probability distributions on sets of the form . For let denote the -wise overlap, defined by
[TABLE]
We use this notation also in the case and observe that is nothing but the empirical distribution of the spins under . Further, we let signify the uniform distribution on ; we usually omit the index to ease the notation. For two spin assignments we let
Lemma 4.1** ([13]).**
For any finite set , any and any there exist and such that for all and all the following is true: if , then .
Call nearly balanced if .
Lemma 4.2** ([23, Lemma 4.7]).**
For any there is such that for all sufficiently large the following is true. If satisfies , then for all nearly balanced we have .
Finally, we need the following elementary observation.
Fact 4.3**.**
For any finite set and any there is such that the following holds. If satisfies
[TABLE]
then there exists such that and
4.2. The Nishimori identity
There exists an important distributional relationship between the teacher-student model and the reweighted random graph model from (3.6) (cf. [67] for a discussion from the physics viewpoint). To state this connection, we need to define an appropriately reweighted distribution on the set of spin assignments. Specifically, we let be a random assignment chosen from the distribution
[TABLE]
As before we skip the index where possible. We refer to the following statement as the Nishimori identity.
Lemma 4.4** ([23, Proposition 3.10]).**
For every distribution on weight functions , for all integers , for every and for every event we have
[TABLE]
A useful consequence of this result is that for every -function .
4.3. Eigenvalues
The vector or matrix with all entries equal to one (in any dimension) is signified by . The transpose of a matrix we denote by . Additionally, denotes the identity matrix (in any dimension). Further, the standard basis vectors on are denoted by , . For the entries of a matrix we use the notation ; thus, for all . The spectrum of a linear operator is denoted by .
The following simple observation will be used several times. Recall from (2.9).
Lemma 4.5**.**
Assume that satisfies SYM. Then the function
[TABLE]
satisfies , and is bounded away from [math].
Proof.
Since for every , SYM immediately yields . Proceeding to the second derivatives, we find
[TABLE]
Consequently, SYM yields . Finally, the fact that follows from (2.1). ∎
As an immediate application we prove Lemmas 3.5 and 3.6.
Proof of Lemma 3.5.
Condition SYM readily implies that is stochastic for every . Hence, for all and consequently . To see that is symmetric let be the permutation on such that , and for all . Since SYM implies that and are identically distributed, we obtain
[TABLE]
To verify the last assertion, consider the function from (4.4). Condition BAL ensures that is concave on the set of probability measures on . Since by Lemma 4.5 the Hessian satisfies , we see that induces a negative semidefinite endomorphism of the subspace . Hence, . ∎
Proof of Lemma 3.6.
To see that is self-adjoint let be the canonical basis of and let be the permutation on such that , and for all . Then for all we have
[TABLE]
Since is a basis of , (4.5) shows that is self-adjoint.
Furthermore, since for all by Lemma 3.5, we see that Similarly, and thus (3.12) follows from Lemma 3.5. In particular, since by Lemma 3.5 we obtain . Because is self-adjoint, this implies that . Finally, assume that . Then for all we have and analogously . Hence, . ∎
4.4. Contiguity
Throughout the paper we apply contiguity between several probability spaces. Some of these contiguity results derive from the following first moment calculation, which also delivers the proof of (3.5).
Lemma 4.6**.**
Suppose that satisfies SYM and BAL. For any there exists such that for all ,
[TABLE]
Moreover, for any we have, uniformly for all ,
[TABLE]
Proof.
By the linearity of expectation and because the constraint nodes of are chosen independently,
[TABLE]
Since SYM and BAL provide that for every , the upper bound is immediate. With respect to the lower bound, recall that the number of such that is of order . Hence, applying Lemma 4.5, we see that for such ,
[TABLE]
Thus, , uniformly for all . Finally, (4.6) follows from because due to (2.1) and the independence of the constraint nodes, and similarly by Lemma 4.5 and (2.1). ∎
Corollary 4.7**.**
Assume that satisfies SYM and BAL and let . Then uniformly for all ,
[TABLE]
and the distribution of and that of are mutually contiguous. Additionally, for any there exists such that
[TABLE]
Proof.
The bound (4.8) and the mutual contiguity of and the uniformly random follow from [23, Corollary 3.27]. With respect to (4.9) BAL, SYM and Lemma 4.6 ensure there is such that for every ,
[TABLE]
By Stirling we can choose large enough so that the last expression is smaller than . ∎
Corollary 4.8**.**
Assume that satisfies SYM and BAL, let and let be a sequence of events. Then the following two statements are true.
[TABLE]
Proof.
Fix . By Lemma 4.4, BAL and Lemma 4.6,
[TABLE]
which implies (4.10). To prove (4.11) pick large enough so that . Then Lemma 4.5 shows that there exists such that for all such that . Hence, by Lemmas 4.4 and 4.6,
[TABLE]
Thus, setting , we obtain (4.11). ∎
Proof of Lemma 3.1.
By construction, the mutual contiguity of and is immediate from the mutual contiguity of and furnished by Corollary 4.7. Moreover, and are identically distributed by the Nishimori identity. ∎
Finally, we derive Theorem 2.6, Corollary 2.7 and Theorem 2.5 from Theorem 2.4.
Proof of Theorem 2.6.
Suppose that and that is a sequence of events. We will prove the following two statements, from which the mutual contiguity of and is immediate.
[TABLE]
Since and are mutually contiguous by Lemma 3.1, mutual contiguity of and follows from (4.13) and (4.14). Moreover, the conditional mutual contiguity given follows by applying the unconditional result to , because Lemma 3.1 and Proposition 3.11 show that the probability of is bounded away from [math] in either model.
We proceed to prove (4.13). Because the random variable from Theorem 2.4 satisfies , there exists such that Hence,
[TABLE]
Thus, setting , we obtain (4.13).
Let us move on to the proof of (4.14). Proposition 3.9 shows that for every there is such that uniformly for all ,
[TABLE]
Hence, by Markov’s inequality for any there is such that Moreover, by Proposition 3.3. As a consequence,
[TABLE]
Thus, choosing , say, we obtain (4.14). ∎
Proof of Corollary 2.7.
The corollary is immediate from Theorem 2.6, Lemma 4.4 and Corollary 4.7. ∎
Proof of Theorem 2.5.
Theorem 2.6 and Proposition 3.3 imply that for all . To prove that this fails to hold for beyond but arbitrarily close to , we calculate the derivative (for the random graph coloring problem a similar argument was used in [21]). It is well known that
[TABLE]
Expanding the logarithm using Fubini and (2.1), we find
[TABLE]
Further with denoting the overlap of independent samples from as in (4.1), we can cast (4.16) as
[TABLE]
Hence, if , then due to (2.1), dominated convergence and Lemma 4.1
[TABLE]
Now, suppose that is such that for all . Then (2.1), dominated convergence and (4.17) yield
[TABLE]
Thus, Theorem 2.2 shows that . Consequently, for any there exists an average degree such that as claimed. The very same argument applies given . ∎
As a preparation for Section 11 we put the following on record.
Corollary 4.9**.**
Assume that satisfies SYM and BAL and that . Then for any sequence of events the following two statements hold.
[TABLE]
Proof.
To prove (4.18) pick a small enough and a smaller . Then Corollary 4.8 shows that implies . Hence,
[TABLE]
and thus (4.13) implies , which proves (4.18).
Similarly, to obtain (4.19) choose and sufficiently small. If , then (4.14) yields Hence, (4.10) implies . ∎
5. The Kesten-Stigum bound
Throughout this section we assume that satisfies SYM and BAL.
5.1. Outline
In this section we prove Proposition 3.7. The key insight is that the dominant eigenvector of restricted to the space gives rise to a natural family of probability distributions , . Up to an error term that decays as , the Bethe free energy of this distribution is given by a quadratic function of the corresponding eigenvalue. Ultimately, the desired bound on follows because the definition (2.4) of ensures that for all , . To implement this programme we need to show that the dominant eigenvector of has a particular form. More precisely, in Section 5.2 we prove
Lemma 5.1**.**
Let . Then and there exists an orthonormal basis of the space and such that
[TABLE]
is a unit vector and .
Throughout this section we denote the eigenvector promised by Lemma 5.1 by and the corresponding eigenvalue by . The particular structure of ensures that
[TABLE]
Further, because the coefficients in (5.1) are non-negative and , we obtain
[TABLE]
Recalling that is the canonical basis of , for each we define by letting
[TABLE]
Finally, let (with the Dirac measure on ).
Lemma 5.2**.**
There exists such that for all we have for all and .
Proof.
Clearly, for all for small enough . Moreover, since by (5.3),
[TABLE]
Hence, and . Similarly, once more because , for each we have
[TABLE]
whence . ∎
Our next goal is to calculate . More precisely, we aim to expand to the fourth order in the limit . The key tool for this expansion is the following elementary lemma, whose proof can be found in Section 5.3.
Lemma 5.3**.**
Suppose and that , has four continuous derivatives. Moreover, setting , assume that satisfies the following conditions.
**T1: **
for all , all and all we have
[TABLE]
**T2: **
there is such that the gradient of at satisfies .
Further, suppose that , let be mutually independent samples from and define
[TABLE]
Then
[TABLE]
Equipped with Lemma 5.3 we will derive the following asymptotic formula in Section 5.4.
Lemma 5.4**.**
We have as .
Finally, Proposition 3.7 is immediate from Lemma 5.4.
Proof of Proposition 3.7.
Due to SYM it is straightforward to verify that Hence, if , then for all small enough because by Lemma 5.2. Therefore, Lemma 5.4 implies that . As this bound holds for all , we conclude that , and thus the assertion follows from Lemma 5.1. ∎
Remark 5.5**.**
A local expansion of the Bethe functional around the atom on the uniform distribution was performed independently by Guilhem Semerjian (manuscript in preparation), albeit with a different objective and without the realization that the eigenvectors of can be used to construct an explicit family of perturbations, cf. (5.4).
5.2. Proof of Lemma 5.1
The canonical basis gives rise to the basis of the -dimensional space . Hence, we can identify with the space of -matrices via the linear map
[TABLE]
Since , is an isomorphism. Moreover, if we equip the space with the Frobenius inner product , then for all .
By Lemma 3.6 the linear operator is self-adjoint and . Therefore, admits an orthogonal decomposition into eigenspaces of . Suppose that and let be the corresponding eigenspace. Moreover, consider the linear map defined by , for . Due to the particular form (2.6) of we have for all . Consequently, . Therefore, for any we have . Because , this means that there exists a unit vector such that . Further, is a symmetric matrix as and satisfies and for all because . Thus, there exists an orthonormal basis of the space and such that
[TABLE]
Since is an isomorphism, (5.6) yields the representation
[TABLE]
Further, if we define , then because for all . Moreover, because is a unit vector and are orthonormal,
[TABLE]
Finally, once more due to the particular form (2.6) of , (5.6) yields
[TABLE]
Combining (5.8) and (5.9), we thus see that is a unit vector with , as desired.
5.3. Proof of Lemma 5.3
We recall the following well-known generalization of the chain rule.
Fact 5.6** (Faà di Bruno’s formula).**
Suppose that has continuous derivatives. Let be the set of all partitions of , denote by the cardinality of a partition and similarly let denote the cardinality of a set in the partition . Then
[TABLE]
For and let
[TABLE]
Because are mutually independent with mean , we have unless for each there is such that . Hence, setting , we see that
[TABLE]
In particular, (5.11) implies
[TABLE]
Proceeding to , we apply Fact 5.6 to obtain
[TABLE]
Since , T1 and (5.13) entail that
[TABLE]
Moving on to , we observe that . Moreover, Fact 5.6 yields
[TABLE]
Hence, T2 yields
[TABLE]
Finally, we come to . Fact 5.6 yields
[TABLE]
Since , T1 implies that
[TABLE]
Moreover, similarly as before T2 implies
[TABLE]
Analogously, once more by T2
[TABLE]
Thus, combining (5.16)–(5.19), we obtain
[TABLE]
Since and , the assertion follows from (5.12), (5.14), (5.15) and (5.20).
5.4. Proof of Lemma 5.4
Recall that , that is an eigenvector of with eigenvalue , and that is the vector defined by (5.3). We tacitly assume that is small enough so that (cf. Lemma 5.2) and we denote by independent samples from . Hence, for any function the expectation can be viewed as a function of . Further, since is the uniform distribution on the distributions from (5.4), which are atoms, the function has the same continuity as .
Ultimately we are going to expand the function to the fourth order. But first we need a few preparations. First we observe that encodes the covariance matrix of the random vector .
Claim 5.7**.**
We have and for all .
Proof.
The first assertion follows from Lemma 5.2, which shows that . Moreover, because the vectors from (5.3) are orthonormal, (5.1) and (5.4) yield
[TABLE]
as claimed. ∎
Additionally, we need the following algebraic relation.
Claim 5.8**.**
For any we have .
Proof.
Since we have
[TABLE]
as claimed. ∎
We proceed to expand . For let
[TABLE]
Then with chosen independently from ,
[TABLE]
and we shall derive the approximations to both summands separately, using Lemma 5.3 in either case.
Claim 5.9**.**
We have
[TABLE]
Proof.
Fixing and for the moment, we consider the function
[TABLE]
Then with denoting the fourth Taylor polynomial of as in equation (5.5), we can write . We are going to show that, with chosen from and chosen from , all mutually independent,
[TABLE]
whence (5.22) is immediate because the Poisson distribution has sub-exponential tails.
To prove (5.23) we apply Lemma 5.3. Thus, we need the first and second partial derivatives of . To work out the first partial derivatives, let , and . Then
[TABLE]
In particular, SYM yields , and thus the assumptions T1–T2 of Lemma 5.3 are satisfied. With respect to the second derivatives, there are two cases. First, fix , distinct and . Let be the permutation such that and for all . Using SYM, we obtain
[TABLE]
Second, fix distinct and any , . Let be the permutations such that and for all and and for all . Then SYM yields
[TABLE]
Hence, Lemma 5.3 gives
[TABLE]
Further, Claim 5.8 yields
[TABLE]
Therefore, since SYM provides that the distribution is invariant under permutations,
[TABLE]
which completes the proof of (5.23).
Moving on to (5.24), we write the remainder for in the support of as
[TABLE]
where is a point on the line segment between the points and . In particular, Hence, Fact 5.6 shows that
[TABLE]
where ranges over the convex hull of the support of . Because all weight functions take values in the interval , we find . In addition,
[TABLE]
Thus, (2.1) shows that , which is (5.24). ∎
Claim 5.10**.**
We have
Proof.
To investigate we apply Lemma 5.3 to . The derivatives are
[TABLE]
where is such that and for all . Thus, SYM yields
[TABLE]
Once more we write , where is the fourth Taylor polynomial as in (5.5). Applying Lemma 5.3, we obtain
[TABLE]
Further, Claim 5.7 yields , whence by Claim 5.8,
[TABLE]
Furthermore, by Fact 5.6 for any in the support of exist on the line segment between and such that
[TABLE]
Hence, (2.1) guarantees that and thus the assertion follows from (5.26). ∎
Proof of Proposition 5.4.
Combining (5.21) with Claims 5.9 and 5.10, we obtain
[TABLE]
Since , and , the assertion follows from (5.27). ∎
6. Overlap concentration in the teacher-student model
Throughout this section we assume that satisfies conditions BAL, SYM, MIN and POS.
6.1. Outline
In this section we prove Proposition 3.3. We will exhibit a connection between the overlap and the derivative of the free energy: if is bounded away from [math] for some , then the derivative of the free energy is so large that the formula cannot possibly hold, in contradiction to Theorem 3.2. We begin with the following “continuity statement”, which is a generalization of [23, Lemma 4.6] for the Potts model: if the overlap deviates from for some average degree , then the same holds for at least a small interval of average degrees.
Lemma 6.1**.**
For any , there is such that the following holds. Assume that is a sequence such that
[TABLE]
Then
[TABLE]
The proof of Lemma 6.1 can be found in Section 6.2. Further, in Section 6.3 we derive the following asymptotic formula for the derivative of the free energy.
Lemma 6.2**.**
Uniformly for all we have
[TABLE]
with chosen from independently of and chosen uniformly and independently.
Corollary 6.3**.**
Uniformly for all we have
[TABLE]
Moreover, for any there is , independent of or , such that uniformly for all ,
[TABLE]
For the special case of the Potts model a result like Corollary 6.3 was known [23, Lemma 4.10]. The proof was relatively straightforward because in the special case it is possible to write a fairly explicit formula for the expression . Remarkably, the following proof shows that we can do without an explicit formula thanks to a mildly tricky application of Jensen’s inequality in combination with condition MIN.
Proof of Corollary 6.3.
Since is convex, Jensen’s inequality gives
[TABLE]
Hence, using the Nishimori identity (4.3) and Corollary 4.7, we obtain
[TABLE]
Combining (6.2), (6.5) and (6.6) with Lemma 6.2 gives (6.3).
To prove the second assertion we expand to the second order around to obtain
[TABLE]
Since for all , (6.7) and (6.6) yield
[TABLE]
Further, with denoting two independent samples from the Gibbs measure of we obtain
[TABLE]
Since are chosen uniformly and independently of each other and of and , we can cast (6.9) in terms of the overlap as
[TABLE]
Further, Corollary 4.7 and the Nishimori identity (4.3) yield , whence
[TABLE]
Moreover, the function is uniformly continuous. Therefore, if , then Fact 4.3, (6.11) and conditions MIN and SYM yield such that
[TABLE]
Finally, (6.2), (6.8), (6.9), (6.10) and (6.12) yield (6.4). ∎
Corollary 6.4**.**
For all we have .
Proof.
This follows from (6.3) by integrating. ∎
Finally, to prove Proposition 3.3 we combine Lemma 6.1 and Corollary 6.3 to argue that if is bounded away from [math] for some , then in fact for all in a small interval the derivative strictly exceeds . Consequently, is strictly greater than for some , in contradiction to Theorem 3.2.
Proof of Proposition 3.3.
Assume that there exist and such that
[TABLE]
Then Lemma 6.1 shows that there is such that with for infinitely many we have
[TABLE]
But then Corollaries 6.3 and 6.4 imply that for infinitely many ,
[TABLE]
Consequently,
[TABLE]
Therefore, Theorem 3.2 yields in contradiction to . ∎
6.2. Proof of Lemma 6.1
The proof, which is a non-trivial generalization of the argument for [23, Lemma 4.6] for the Potts model, is based on a coupling of the random factor graphs and with different numbers of constraint nodes; to set up the coupling we use the Nishimori identity (4.3). Thus, as a first step we need a coupling of and .
Lemma 6.5**.**
For any , there is such that
[TABLE]
Proof.
Given pick a sufficiently small . Let be the function from (4.4). Because the constraint nodes of are chosen independently, for all , we have
[TABLE]
Furthermore, by Corollary 4.7 there exists such that
[TABLE]
which implies that
[TABLE]
Applying Lemma 4.5 to expand (6.14) to the second order, we obtain such that for all and all satisfying ,
[TABLE]
Hence, choosing small enough, we can ensure that for all such that and all satisfying the estimate
[TABLE]
holds. Further, combining (6.16) and (6.17), we obtain that
[TABLE]
provided that and was chosen small enough. Moreover, combining (6.17) and (6.18), we conclude that if and , then
[TABLE]
Finally, the assertion follows from (6.15) and (6.19). ∎
Proof of Lemma 6.1.
Assume that satisfies (6.1). Pick small enough, let be the number promised by Lemma 6.5 and assume that is a large enough number such that and
[TABLE]
Further, suppose that is such that . Then by Lemma 6.5 we can couple and such that the event satisfies
[TABLE]
We extend this to a coupling of a pair of factor graphs such that is distributed as and is distributed as as follows. First choose from the distribution . Then obtain from by deleting a uniformly chosen set of constraint nodes. On the event set . If does not occur, then choose the constraint nodes of independently of those of in such a way that is distributed as .
Now, (6.20) implies that with probability at least the random graph is such that a random sample from satisfies . By Corollary 4.7 and the Nishimori identity (4.3), with probability this random sample is nearly balanced. Consequently, there exists a map that provides a nearly balanced for every factor graph such that . Thus, Hence, assuming that was chosen small enough, we obtain from (6.13) and the Nishimori identity (4.3) that
[TABLE]
Finally, on the event the factor graph is obtained from by deleting a few random constraint nodes. Thus, for a graph let be a random assignment with distribution . Then (6.22) implies
[TABLE]
Hence, by the Nishimori identity (4.3) and (6.21),
[TABLE]
Since by construction is nearly balanced, the assertion follows from (6.23) and Lemma 4.2. ∎
6.3. Proof of Lemma 6.2
We shall see shortly that calculating the derivative basically comes down to calculating the difference . We are going to perform this calculation by way of a very accurate coupling of and . A similar argument was used in [23] for the case that the set of weight functions is finite. Once more the coupling is based on the Nishimori identity (4.3). Thus, we begin with a coupling of the random assignments and . The following is a generalization of [23, Corollary 3.29].
Lemma 6.6**.**
There exists a coupling of and such that the following holds uniformly for all .
- (i)
With probability we have . 2. (ii)
With probability the set has size at most .
Proof.
By definition, for any
[TABLE]
Further, due to the independence of the constraint nodes, we obtain
[TABLE]
Let be the function from (4.4). Then Lemma 3.5 and Lemma 4.5 show that for ,
[TABLE]
Hence, expanding the r.h.s. of (6.25) to the second order, we obtain
[TABLE]
Moreover, let be the set of all such that is an integer for every . Then
[TABLE]
Further, let . Then (6.28), Stirling’s formula and Lemmas 3.5 and 4.5 yield
[TABLE]
Of course, the corresponding formula holds for . Hence, (6.25) and (6.26) yield
[TABLE]
Combining (6.24), (6.25), (6.27) and (6.29), we conclude that
[TABLE]
By Corollary 4.7 is bounded by with probability at least . Hence, (6.30) shows that have total variation distance , which yields the first assertion follows.
With respect to the second, we obtain from Corollary 4.7 that
[TABLE]
Hence, if we choose the empirical distributions independently, then with probability . Finally, we obtain the desired coupling of , for (ii): given , choose a collection of pairwise disjoint sets with randomly, set for all and let assign different spins to the nodes in so as to ensure that and . ∎
Corollary 6.7**.**
Uniformly for all the following is true. Given the random assignment choose a constraint node from the distribution
[TABLE]
and choose independently. Then
[TABLE]
Proof.
By the Nishimori identity (4.3) we have
[TABLE]
To calculate the difference of the two terms on the r.h.s. we couple and via Lemma 6.6. Clearly, if , then we can couple and such that is obtained from by adding one additional independent constraint node and thus
[TABLE]
Hence, by (2.1) and the first part of Lemma 6.6,
[TABLE]
If and , then by (2.1) we have
[TABLE]
Further, let us write for a factor node chosen from (2.14) with respect to and for one chosen with respect to . Let be the event that a random factor node does not have a neighbor in . Since , (2.1) and (6.36) imply that
[TABLE]
and similarly . Moreover, given that , both factor nodes are identically distributed. Therefore, there is a coupling of such that with probability . Hence, can be coupled such that the set of constraint nodes in which both factor graphs differ has expected size . Indeed, is a binomial random variable because the constraint nodes are chosen independently. Thus, (2.1) implies
[TABLE]
and therefore
[TABLE]
Finally, if either or , then we couple by just choosing their constraint nodes independently. Then (2.1) implies
[TABLE]
Combining (6.33)–(6.38) and applying Corollary 4.7 and Lemma 6.6, we obtain
[TABLE]
as claimed. ∎
Proof of Lemma 6.2.
The proof is a generalization of the proof of [23, Lemma 3.32], which dealt with the Potts model. We begin with the well-known observation that
[TABLE]
To calculate the last term we apply Corollary 6.7. Let us write for brevity. Expanding the logarithm on the r.h.s. of (6.32), we obtain
[TABLE]
(where the expectation is over the choice of , and ). Due to (2.1) and Fubini’s theorem we can interchange the sum and the expectation. Hence, writing the expectation on chosen from (6.31) out, with chosen from independently of everything else, we obtain
[TABLE]
Further, because for all with probability at least by Corollary 4.7, we obtain from (2.1) and SYM that
[TABLE]
To evaluate the expectation on the r.h.s. of (6.40) we harness the Nishimori identity (4.3), which implies the following: if is an -function, then . Applying this fact to the function , we obtain
[TABLE]
Plugging (6.41) into (6.40) and writing for uniformly random indices chosen from we obtain
[TABLE]
Finally, since , (6.42) yields (6.2). ∎
7. Moment calculations
In this section we prove Propositions 3.8 and 3.9. We begin with a very general calculation in Section 7.1, from which we subsequently deduce Propositions 3.8 and 3.9.
7.1. An asymptotic formula
The following result paves the way for the proofs of Propositions 3.8 and 3.9.
Proposition 7.1**.**
Assume that satisfies SYM and that is such that the eigenvalues of satisfy
[TABLE]
Furthermore, assume that but as and let
[TABLE]
Then uniformly for all ,
[TABLE]
Proof.
Let be the set of all distributions such that is an integer vector and such that for all . Additionally, for each let Then
[TABLE]
Remembering from (4.4), we claim that uniformly for all and ,
[TABLE]
Indeed, because there are precisely assignments such that and since the constraint nodes of are chosen independently, we have the exact expression and thus (7.3) follows from Stirling’s formula. Combining (7.2) and (7.3), we obtain
[TABLE]
In order to calculate the sum via the Laplace method, we compute the first two derivatives of . The first derivative works out to be
[TABLE]
Hence, using SYM we see that the gradient at the point equals
[TABLE]
Proceeding to the second derivatives, we find
[TABLE]
Consequently, using SYM we find that the Hessian at comes out as
[TABLE]
Additionally, the third derivatives of are uniformly bounded. Thus, combining (7.5) and (7.6) and observing that for all , we see that uniformly for all ,
[TABLE]
Since , plugging (7.7) into (7.2) we obtain uniformly for all ,
[TABLE]
Further, Lemma 3.5 shows that is symmetric, there exists an orthogonal matrix such that , where is the diagonal matrix whose entries are the eigenvalues of . Since is stochastic (once more by Lemma 3.5), the top eigenvalue is and the corresponding eigenvector is . Moreover, because all are probability distributions on , we have for all . Therefore, the set is contained in the -dimensional subspace spanned by the eigenvectors of corresponding to . Hence, because the sum from (7.8) can be approximated by a -dimensional Gaussian integral and thus uniformly for all ,
[TABLE]
as claimed. ∎
Remark 7.2**.**
We observe that the proof of Proposition 7.1 did not use (2.1).
7.2. Proof of Proposition 3.8
In this section we assume that satisfies SYM and BAL. Then Lemma 3.5 readily shows that (7.1) holds for all and thus Proposition 7.1 applies. Hence, to prove Proposition 3.8 we merely need to show that for a suitable .
Lemma 7.3**.**
Assume that satisfies SYM and BAL, let and set . Then uniformly for all we have
Proof.
Let be the set of all distributions such that is an integer vector and let be the set of all such that for all . Let (cf. (4.4)). Then by the linearity of expectation and the independence of the constraint nodes of ,
[TABLE]
Hence, with denoting the uniform distribution, uniformly for all ,
[TABLE]
Finally, Proposition 7.1 implies that . ∎
Proposition 3.8 is immediate from Proposition 7.1 and Lemma 7.3.
7.3. Proof of Proposition 3.9
Assume that satisfies SYM and BAL and that . In order to calculate the second moment, we employ a known construction (e.g., [13]) of an auxiliary random factor graph model whose first moment equals the second moment of the original model. The spin set of this auxiliary model is the set and we denote the pairs by . Further, for functions we define
[TABLE]
Then the set of weight functions of the auxiliary model is . Moreover, the probability distribution on is simply the image of under the measurable map . Clearly, the fact that satisfies SYM implies that so does . (However, does not necessarily satisfy BAL, and need not satisfy the last two bounds in (2.1), but these are not needed to apply Proposition 7.1 due to Remark 7.2.)
For any the matrix as defined in (2.5) can be expressed in terms of the matrix induced by the original weight function as . Hence, recalling the definitions (2.6) and (2.9),
[TABLE]
Proof of Proposition 3.9.
For a factor graph let be the factor graph obtained by replacing the weight function by for every factor node of . Then
[TABLE]
Hence, if satisfies , then (7.9), Lemma 3.6, Proposition 3.7 and Proposition 7.1 yield
[TABLE]
as desired. ∎
8. Cycle census
Throughout this section we assume that satisfies SYM and BAL.
The aim is to prove Proposition 3.11. The proof of the first assertion is rather straightforward.
Lemma 8.1**.**
Let . For any we have , uniformly for all . Moreover, if are pairwise disjoint and , then uniformly for all ,
[TABLE]
Proof.
Let be such that takes the least possible value for every . Then (8.1) is immediate from Fact 3.10 and the fact that in the weight functions of the constraint nodes are chosen independently from . Furthermore, if is another sequence, then the random graph is obtained from by adding at most random edges and with probability none of these edges closes a cycle of bounded length. Hence, we obtain the desired uniform rate of convergence for all sequences in . ∎
Lemma 8.2**.**
Let . For any with we have , uniformly for all . Moreover, if are pairwise disjoint, and , then uniformly for all ,
[TABLE]
The proof is based on known arguments. We begin by calculating the expected number of dense small subgraphs of .
Claim 8.3**.**
Let be an integer and let be the number of subsets of size that span more than edges. Then uniformly for all and all .
Proof.
Fix numbers such that and let and be sets of size , . Moreover, let be a set of size and let be the event that for all pairs we have . Then
[TABLE]
Furthermore, (2.1) ensures that there is a number that does not depend on such that the lower bound holds. Therefore, (2.14) implies that for variable nodes , any constraint node and for any subset we have
[TABLE]
Since the constraint nodes are chosen independently, (8.3) implies that, uniformly for all and all ,
[TABLE]
Finally, given the number of possible sets is bounded by , the number of possible does not exceed and given and the number of possible sets is bounded. Thus, since the assertion follows from (8.2) and (8.4). ∎
Proof of Lemma 8.2.
Due to the Nishimori identity (4.3) we may prove the claim for the random factor graph model . Moreover, by Corollary 4.7 we may condition on the event that for all , in which case SYM yields
[TABLE]
We begin by showing that for any uniformly for all ,
[TABLE]
Indeed, let be a family of pairwise distinct indices such that (cf. CYC1) and let be pairwise distinct indices such that if (cf. CYC2). Let be the event that form a cycle with signature . Set . Then by (2.14), (3.17) and (8.5) we have
[TABLE]
Summing on , we get
[TABLE]
as claimed.
For integers let Then due to the inclusion/exclusion argument for the joint convergence to independent Poisson variables [19, Theorem 1.23], in order to complete the proof it suffices to show that for any , uniformly for all ,
[TABLE]
Combinatorially, is nothing but the total number of -tuples of cycles in such that the first cycles have signature , the next cycles have signature , etc. Hence, if we define as the number of such families of pairwise vertex disjoint cycles, then Claim 8.3 yields
[TABLE]
Furthermore, we claim that uniformly for all ,
[TABLE]
Indeed, the argument that we used to prove (8.6) easily extends to a proof of (8.10); for if we fix index families that suit the signatures such that no index from resp. occurs more than once, then similar steps as above reveal that
[TABLE]
Hence, (8.10) follows by summing on all . Finally, (8.8) and (8.10) show the dedired convergence for a single sequence and the uniformity of the rate of convergence follows from a similar argument as in the proof of Lemma 8.1. ∎
Proof of Proposition 3.11.
The claim (3.19) about the cycle counts is immediate from Lemmas 8.1 and 8.2. To prove the assertion about the probability of , let us first assume that . Then the event occurs iff and thus the assertion about is immediate from Fact 3.10. Moreover, the assertion about follows from Lemma 8.2 applied to all signatures of the form and . For we express the event as In particular, occurs only if and therefore, by the same token as in the case , the expressions stated in Proposition 3.11 are asymptotic upper bounds on . Finally, we notice that for the expected number of pairs such that is . ∎
9. The limiting distribution
Throughout this section we assume that satisfies SYM and BAL.
In this section we prove Proposition 3.12. Let be chosen independently from and for set . The following lemma is the main step toward the proof of (3.20).
Lemma 9.1**.**
If , then
Proof.
Let . Then
[TABLE]
Hence, remembering (2.6) and (2.9), we find Furthermore, Lemmas 3.5 and 3.6 yield
[TABLE]
and thus
[TABLE]
As Proposition 3.7 yields , whence summing (9.1) on completes the proof. ∎
To prove (3.20) we need to get a handle on the discretization of the set induced by the partition for . Hence, we introduce
Corollary 9.2**.**
If , then .
Proof.
By Jensen’s inequality and thus the assertion follows from Lemma 9.1. ∎
We are ready to prove (3.20).
Proof of Proposition 3.12, part 1..
Given let
[TABLE]
The construction of ensures that for every fixed , converges to almost surely as . Hence, by Lemma 9.1, Corollary 9.2 and dominated convergence,
[TABLE]
which proves (3.20). ∎
In order to establish the convergence of to we use similar arguments. We begin with the following bound.
Lemma 9.3**.**
For every there exists such that
Proof.
Pick sufficiently small and let Because by Lemma 3.5 the matrices are stochastic, we have
[TABLE]
In fact, since the trace is invariant under cyclic permutations, we obtain
[TABLE]
Since are chosen independently, (2.1) and (9.2) imply that we can choose small enough so that for all , in which case the sum converges. ∎
Corollary 9.4**.**
For every and every we have .
Proof.
Because all weight functions take values in , it is obvious that for every . Moreover, similar steps as in the previous proof show for some small . Finally, since is convex, the assertion about follows from Jensen’s inequality. ∎
We are going to prove that are well-defined by showing that they come out as the limit of the as . However, a priori it may not be entirely clear that the are well-defined because they involve sums on random numbers of terms. Let us observe that this is not a problem actually, because Corollary 9.4 implies the following. We continue to let signify a family of independent samples from .
Corollary 9.5**.**
For every the following -limits exist:
[TABLE]
Lemma 9.6**.**
For every there exists such that for all , ,
[TABLE]
Proof.
Let , . Then and for every ,
[TABLE]
because and due to Cauchy-Schwarz. Further, because the are i.i.d., for any given integer we find
[TABLE]
As , (9.4) implies
[TABLE]
Moving on to the second summand in (9.3), we recall that the function is convex and that for any (small) there exists such that for all . Hence, introducing the convex function , we have
[TABLE]
Lemmas 9.1 and 9.6 show that summing the right hand sides of (9.5) and (9.6) on gives a finite number. Thus, the first assertion follows from (9.3). With respect to the second bound, analogous steps yield
[TABLE]
and thus the desired bound follows from Jensen’s inequality. ∎
Proof of Proposition 3.12, part 2.
Lemma 9.6 shows that the random variables are uniformly -bounded. Furthermore, the construction of guarantees that almost surely for every fixed . Hence, converges to in the -norm and a second application of Lemma 9.6 shows that tends to in the -norm. ∎
10. The condensation threshold
Throughout this section we assume that satisfies SYM, BAL and POS.
In this section we prove Theorems 2.2 and 3.2. As a technical preparation we need a concentration inequality for the free energy of our random factor graph models.
10.1. Concentration
We begin with the following elementary observation.
Lemma 10.1**.**
Suppose that satisfies SYM and BAL. For a factor graph define
[TABLE]
Then for every there exists such that uniformly for all , and we have
[TABLE]
Proof.
The bound (2.1) guarantees that As a consequence, the probability that either or contains a constraint node such that is bounded by . Therefore, it suffices to prove (10.1) given . Due to (2.1) the conditional expectation \mathbb{E}[\max_{\tau}|\ln\boldsymbol{\psi}(\tau)|\,\big{|}\,\max_{\tau}|\ln\boldsymbol{\psi}(\tau)|<(tn)^{3/8}] is bounded. Thus, the definition of the random factor graph models guarantees that uniformly for all ,
[TABLE]
Further, because the constraint nodes are chosen independently, Azuma’s inequality implies that for any ,
[TABLE]
Thus, (10.1) follows from (10.4) and (10.5) applied to with chosen large enough. Finally, let either or . Since by Cauchy-Schwarz, (10.1) yields
[TABLE]
whence (10.2) and (10.3) are immediate. ∎
Lemma 10.2**.**
Suppose that satisfies SYM and BAL and let . There exists such that for any and there exists such that for all , we have
[TABLE]
Proof.
Let either or and choose big enough so that the following is true: if , then
[TABLE]
Let be the factor graph obtained from by deleting all constraint nodes such that . Then (10.6) ensures that . Furthermore, if is obtained from by changing the neighborhood of some constraint node and/or its weight function, subject merely to the condition that the new weight function satisfies , then . Therefore, Azuma’s inequality implies that for any ,
[TABLE]
Combining (10.6) and (10.7) with (10.2) and (10.3) completes the proof. ∎
10.2. Proof of Theorem 3.2
We recall from Section 3.5 that is the partition of obtained by chopping into sub-cubes with side lengths . Since is finite the distribution of is supported on a finite set of weight functions .
Lemma 10.3**.**
For any , there is such that for all and all we have
[TABLE]
Proof.
Let
[TABLE]
Analogously, for a fixed let
[TABLE]
That is, we approximate by the average over the weight functions in the sub-cube that belongs to. Since is continuous on and therefore uniformly continuous on any compact subset of , uniformly as on the entire space for every . Since the Poisson distribution has sub-exponential tails, this implies the desired convergence for the first term on the right hand side of (2.3). A similar argument applies to the second term. ∎
Lemma 10.4**.**
The distribution satisfies SYM and BAL. Moreover, for any , there is such that the following is true for all . With chosen from , chosen from and chosen from , all mutually independent, we have
[TABLE]
Proof.
The fact that SYM and BAL are satisfied is immediate from the fact that is a conditional expectation of . To prove (10.8) we observe that by the uniform continuity of on compact subsets of , we can choose large enough so that for all , ,
[TABLE]
Thus, (10.8) follows from the triangle inequality and the fact that satisfies POS. ∎
Lemma 10.5**.**
For any , there is such that uniformly for all we have
[TABLE]
Proof.
By Lemma 3.1 the models and are mutually contiguous. Hence, Lemma 10.2 implies that . Similarly, since satisfies SYM and BAL by Lemma 10.4, another application of Lemmas 3.1 and 10.2 yields . Therefore, it suffices to prove that for any for all sufficiently large we have
[TABLE]
In fact, since the Poisson variable has sub-exponential tails, (4.6) shows that (10.9) would follow if we could show that
[TABLE]
To prove (10.10) pick small enough and then large enough. Fix any and . We couple two factor graphs such that has distribution and is distributed as as follows. First choose . Let us write for the weight functions of . Then let be the factor graph where each constraint node is adjacent to the same variable nodes as in but where the corresponding weight function is . It is immediate from (2.14) that is distributed as .
To bound we observe that
[TABLE]
Since the function is strictly convex on for small and large we obtain from (2.14), the tail bound (2.1) and Jensen’s inequality that
[TABLE]
On the other hand, since the map is uniformly continuous, we can choose a sufficiently large such that whenever . Thus, (10.10) follows from (10.11) and (10.12). ∎
Proof of Theorem 3.2.
Fix . Since Lemma 10.4 shows that satisfies SYM and BAL, [23, Proposition 3.6] implies that
[TABLE]
Furthermore, [23, Proposition 3.7] implies together with equation (10.8) from Lemma 10.4 that for any there is such that
[TABLE]
Combining (10.13) and (10.14) with Lemma 10.3, we conclude that for any for all large enough we have
[TABLE]
Applying Lemma 10.5 therefore yields
[TABLE]
Moreover, since and are mutually contiguous by Lemma 3.1, Lemma 10.2 implies that , too. Finally, since the probability of the event is bounded away from [math] by Proposition 3.11, Lemma 10.2 shows that
[TABLE]
as well. ∎
10.3. Proof of Theorem 2.2
We begin with the observation that is bounded and bounded away from [math].
Lemma 10.6**.**
We have .
Proof.
Fix any . Then for any nearly balanced the expected degree of every variable node of is . Therefore, the well-known result on the ‘giant component’ threshold of a random hypergraph (e.g., [65]) shows that with probability the random factor graph consists of connected components of order , all but a bounded number of which are trees. But assumption SYM guarantees that for every tree factor graph with variable nodes and constraint nodes the free energy is precisely equal to , as is easily verified by induction on the size of the tree. Hence, by Lemma 10.2. Since this formula holds for every nearly balanced assignment , we obtain . Hence, Theorem 3.2 shows that and thus .
We move on to the upper bound. Recalling that has distribution and that the constraint nodes in the teacher-student model are chosen independently, we obtain
[TABLE]
Further, plugging in the definition (2.14) of the teacher-student model, we can write the last term out as
[TABLE]
Since the uniformly random is nearly balanced with probability as , due to SYM and (2.1) the last expression simplifies to
[TABLE]
Further, due to the third part of (2.1) and because is strictly convex, Jensen’s inequality shows that there exists an -independent number such that
[TABLE]
Combining (10.16)–(10.18), we find . Hence, for we obtain
[TABLE]
Hence, applying Theorem 3.2 and recalling (2.4), we conclude that . ∎
We derive Theorem 2.2 from Theorem 3.2 in two steps. First, generalizing the argument from [23, Section 3.5] to the setting of infinite , we prove the free energy formula for .
Proof of Theorem 2.2, part 1..
First assume that is such that for some ,
[TABLE]
Then there exists a sequence such that
[TABLE]
Hence, Lemma 10.2 shows that for a suitably large and a sufficiently small ,
[TABLE]
Now, with chosen small enough, we define
[TABLE]
Theorem 3.2 and Lemma 10.2 yield because . Therefore, (3.5) and (3.6) yield
[TABLE]
Moreover, the definition (10.20) of guarantees that
[TABLE]
But combining (10.21) and (10.22) with the Paley-Zygmund inequality, we obtain
[TABLE]
which contradicts (10.19) if is chosen sufficiently small. Finally, since the probability of the event is bounded away from [math] by Proposition 3.11, the assertion about follows from Lemma 10.2. ∎
We proceed to show that if by generalizing the argument from [23, Section 3.5] to infinite sets .
Lemma 10.7**.**
Assume that is such that for some . Then for every large enough there exists such that for large enough ,
[TABLE]
Proof.
If , then Theorem 3.2 shows that
[TABLE]
Fix a small enough and an even smaller and let . Since is chosen uniformly and thus while for large enough we have by Lemma 10.2, it suffices to prove that for all ,
[TABLE]
To establish (10.25) we set up a coupling of , for any . Let us write for the constraint nodes of and for those of . Relabeling the variable node as necessary, we may assume without loss that . Therefore, (2.14) shows that we can couple the distribution of the neighborhoods , such that, with chosen small enough,
[TABLE]
Furthermore, if indeed and , then by (2.14) the weight functions are identically distributed and we couple such that . If, on the other hand, or , then we choose , independently according to (2.14).
Since the constraint nodes are chosen independently, (10.26) shows that the number of such that either or is binomially distributed with mean at most . Hence, . Furthermore, (2.1) shows that the expected impact on the free energy of the constraint nodes where differ is bounded by for some number that does not depend on or . Therefore, choosing small enough we can ensure that
[TABLE]
Combining (10.24) and (10.27), we obtain
[TABLE]
Thus, (10.25) follows from (10.28) and Lemma 10.2. ∎
Lemma 10.8**.**
Assume that satisfies SYM and BAL. For any the following is true uniformly for . If is an event such that , then .
Proof.
This is immediate from the Nishimori identity Lemma 4.4 and (4.12). ∎
Proof of Theorem 2.2, part 2..
Suppose that . Then there exist and such that
[TABLE]
Let be a -variable and consider the event . Then Markov’s inequality and Lemma 4.6 yield
[TABLE]
On the other hand, Lemma 10.7 shows that for large enough ,
[TABLE]
Now, for a factor graph obtain by removing each constraint node with probability independently. Moreover, let be the set of all factor graphs such that , where, of course, the probability is over the removal process only. Since the distribution of is identical to that of , (10.29) yields
[TABLE]
Similarly, and are identically distributed. Thus, (10.30) and Lemma 10.1 imply that
[TABLE]
Furthermore, (10.32) and Lemma 10.8 yield such that
[TABLE]
To complete the proof, assume for contradiction that . Then for arbitrarily large . Thus, we can apply Lemma 10.2 to conclude that for infinitely many ,
[TABLE]
Combining (10.34) with Lemma 10.1, we see that the event satisfies for arbitrarily large . But then
[TABLE]
a contradiction that refutes the assumption . ∎
11. Reconstruction
Throughout this section, when there is no danger of confusion we abbreviate to and to . For a rooted factor tree and any vertex in that tree, let denote the children of . Also, for any factor graph , any variable node in this graph and any integer , we let denote the set of variable nodes at distance from .
Given some graph , any and an assignment let , or denote the assignment that specifies for the set Furthermore, let be two distribution on the configuration space . For any we let
[TABLE]
denote the total variation distance between the projections of and on . Also, for some we let denote the distribution conditional on that has assignment .
For the factor tree we define the broadcasting process which generates an assignment as follows: There is some initial distribution . We set according to the distribution . Then, inductively, assume that we have for some variable node . For each , independently, the variables nodes in are assigned with probability proportional to
[TABLE]
where is the weight function that corresponds to and is the position of inside the constraint .
Lemma 11.1**.**
Consider some factor tree of height , rooted at (variable) node . Let be the assignment generated by the broadcasting process such that the initial distribution is the uniform over .
For any , it holds that
[TABLE]
where is the Gibbs distribution specified by .
Proof.
Let be distributed as in . Then, we have that is distributed uniformly at random in .
Furthermore, let be a variable node. Given for each the assignment is independent of the other vertices in . Furthermore, for each assignment we have with probability proportional to
[TABLE]
The lemma follows by using the definition of the broadcasting process. ∎
Consider a sequence of factor trees , where contains levels of variable nodes. Let
[TABLE]
recall that is the set of variable nodes at distance from the root . Similarly, we define
[TABLE]
We study the reconstruction problem on the sequence of factor tree by means of the broadcasting processes and the quantity . To be more specific, for each , rooted at , consider two broadcasting processes with some initial distribution and let and be the assignment s that are generated, respectively. Then, the quantity expresses the -distance between the distributions of the configurations and , as , conditional that , , for worst-case pair . The following result implies that for studying reconstruction on we can either consider , or .
Lemma 11.2**.**
Let be a sequence of factor trees, where contains levels of variable nodes. Then we have that if and only if .
Proof.
For some integer , we have that
[TABLE]
Clearly, the above implies that . In turn, we get that if , then , as well.
We work in a similar way for the other direction. That is,
[TABLE]
Clearly, the above implies that . In turn, we get that if , then . ∎
In the following result we show that that non-reconstruction is monotone in the expected degree of . In particular we show the following result.
Lemma 11.3**.**
For any such that , the following is true: If , then .
The proof of Lemma 11.3 appears in Section 11.1
We proceed by introducing some further notions. For a rooted factor graph , let be the isomorphism class of rooted factor graphs to which belongs. Let be the induced subgraph of which includes and all variable nodes which are within graph distance from . For , is a tree with probability . In particular, there is a coupling of the distribution induced by and such that the following is true:
[TABLE]
For what follows, we let the event .
Lemma 11.4**.**
Let . Consider generated according to Teacher-Student model and some vertex . Also, consider the pair such that is generated by a broadcasting process for which we assign the root the configuration with probability 1.
There is a coupling between and such that the following is true:
[TABLE]
where is an isomorphism between and . The same result holds for .
The proof of Lemma 11.4 appears in Section 11.2.
In light of Lemma 11.4 and (11.3) Theorem 2.9 is immediate.
The above result implies that in the teacher-student model, the distribution of the configuration of that is specified by is asymptotically the same as the distribution of the configuration that is induced by the broadcasting process on . We use the above result with Corollary 2.7 to relate reconstruction on random factor graph and random tree .
Now we proceed with the proof of Theorem 2.8. In the following lemma we provide the upper-bound for and .
Lemma 11.5**.**
For any there exists such that . Furthermore, for any we have .
Proof.
We consider . For any graph and two vertices such that and any , it is easy to see that
[TABLE]
Furthermore, working as in Lemma 11.2, we can substitute the r.h.s. of the above inequality and get
[TABLE]
For any two fixed vertices in , we denote by the event that . Then, for we get that
[TABLE]
Note that for any two fixed vertices it holds that . To see this, let be the number of vertices within distance from . Furthermore, given each vertex belongs to the neighborhood of with probability at most . Then, noting that , we get that
[TABLE]
where we use Markov’s inequality to bound . Combining all the above, we get that for any it holds that
[TABLE]
We conclude the part for by combining the above with (2.12). Recall that the later states that for any there exists such that the l.h.s. is strictly positive.
Repeating the same arguments as above we get that for it holds that
[TABLE]
where is defined in (2.16).
Note that the l.h.s is bounded away from zero. To see this note that if it were zero, then it would have implied that for we get that . Clearly, this is not true, e.g. see Corollary 6.3 and Theorem 3.2. Then we conclude that for .
Using Lemma 11.4 we get that for any as well. To be more specific, note that Lemma 11.4 implies the following: Let . Also consider the pair and . For any , there is a coupling between and such that with probability we have , with some isomorphism . Furthermore, for every we have that . This coupling implies that . That is, for we have . Then, using the monotonicity result from Lemma 11.3 we get that for any we have . The lemma follows. ∎
In light of Lemma 11.5, we get the first part of Theorem 2.8 by using the following result.
Lemma 11.6**.**
For any we have that . Furthermore, for we have that .
The proof of Lemma 11.6 appears in Section 11.3.
As far as the the second part of Theorem 2.8 is concerned essentially it follows as a corollary from all the previous results in this section. It is elementary to verify that
[TABLE]
Using Lemma 8.1 we get that . Then, using Lemma 11.6 we get that for any the l.h.s. of (11.8) is equal to zero. We proceed by showing that for any there exists such that
[TABLE]
Using Theorem 2.5 and standard arguments e.g. (e.g., [13, Section 2]) there is such that
[TABLE]
Then (11.9) follows by working as in the proof of Lemma 11.5. Finally, we show that for we have
[TABLE]
For showing the above, we work as in the second case of Lemma 11.6, i.e. we use Lemma 11.4 and the contiguity result in Corollary 2.7. More specifically, if there is such that the l.h.s. of (11.10) is zero, then Corollary 2.7 would imply that
[TABLE]
recall that is the distribution over configurations in that is induced by conditional on . If the above was true, then Lemma 11.4 would imply that . Clearly this is a contradiction due to Lemma 11.6.
The theorem follows.
11.1. Proof of Lemma 11.3
Consider two factor trees and with roots , respectively. We say that satisfy the relation if there is an injective mapping such that the following is true: for every we have , while every such that is assigned the same weight function in both trees and , occupy the same position within . Furthermore, for every function node we have and every occupies in the same position as in .
Lemma 11.7**.**
Consider two sequences of factor trees and such the the following is true: For and we have , for Then, we have that
[TABLE]
Proof.
For some , consider and . Since we assumed that , let be the mapping that verifies that property.
For any two consider two configurations generated by the broadcasting process on such that and . Similarly, let two configurations generated by the broadcasting process on such that and . Then it suffices to show the following: For any , if there is a coupling for such that the probability that is equal to , then there exists a coupling for such that the probability that is at most .
From the definition of the broadcasting process, we get the following: Let and be two configurations generated by broadcasting process on and , respectively, such that , for some . Then there is a coupling for , such that for every , we have that .
Assume that we have the coupling for and . We combine couplings and to get . In particular we use the couplings as follows: First, we couple and by using . Then, we use to couple and . Finally, we use to couple and .
In the above “chain of couplings", note that we have only if . This implies that if in the probability of the event is equal to , then in the probability of having is at most . The lemma follows. ∎
In light of Lemmas 11.2, 11.7 we get the following corollary.
Corollary 11.8**.**
Consider two sequences of factor trees and such that for and we have , for , then the following is true: If , then .
The lemma follows by using the above corollary and noting that for any such that there is a standard coupling such that .
11.2. Proof of Lemma 11.4
The case where is almost identical to the case where we don’t restrict . For this reason we omit the proof of the case where .
Let the pairs and . Then, we define the relation “" such that if the following holds: and belong to the same isomorphism class of rooted trees, where is rooted at and is rooted at . Furthermore, if is an isomorphism between the two trees, then for every we have that . We are going to show a coupling that has the property that
[TABLE]
For what follows, we denote the isomorphism and , if such exists.
Before proceeding let us state some, easy to prove results. Recall that for an assignment on vertices we denote by its empirical marginal distribution. Furthermore, it is elementary to show that
[TABLE]
Let be the number of edges in . Recall that is a random variable which is distributed as in Poisson with parameter . Applying standard Chernoff’s bounds for we get that
[TABLE]
We let denote the number of vertices in . Note that for every variable node , the cardinality of is dominated by the Poisson distribution with parameter . With this observation we get that
[TABLE]
The coupling is as follows: If is such that or we don’t couple and at all. Otherwise, the coupling is defined inductively.
First consider the coupling between and . Note that . Due to our assumption about , we can have such that
[TABLE]
The above follows by using a maximal coupling for choosing .
The induction step is as follows: Assume that we have exposed partly and and the corresponding parts agree. That is, let and be the two parts of and , respectively. Our assumption is that . W.l.o.g. assume that the leaves of the trees are variable nodes.
Let be a leaf in whose descendants have not been revealed so far. The same holds for in . Let be the number of hyper-edges of that have revealed so far. Recall that the number of all hyper-edges in is . Then, it is an easy calculation to get that for any we have
[TABLE]
If is the number of edges of the tree we have revealed up to vertex , then we have the crude upper bound that . We have that
[TABLE]
where the third inequality follows from Markov’s inequality and the last inequality follows from our assumption that . Combining the two above relations we get the following: For any it holds that
[TABLE]
Similarly we get that .
Recall that for a vertex we have that is distributed as in Poisson with parameter . Using this observation we can have such that
[TABLE]
We extend by defining a bijection between and . From the definition of we get that each chooses a weight function from a distribution which is within total variation distance from . Note that the term comes from the fact that is not perfectly balanced, i.e. we allow some fluctuations on the sizes of the color classes. For we have that it chooses its weight function with probability . The above observations imply that we can have such that
[TABLE]
By choosing the same weight function for both and we imply that the position of and is the same in the two functions.
Finally, for every pair of constraint nodes and for which we have chosen the weight function we decide on and , where and . For each configuration we have with probability proportional to
[TABLE]
where is the position of inside the constraint . Also, we have with probability proportional to
[TABLE]
From the above, it is clear that we can have such that
[TABLE]
Let and be the new parts of of and , after the revelation of and , for every and for every .
Then, using all the above and a simple union bound gives that
[TABLE]
The law of total probability implies that
[TABLE]
Lemma 11.4 follows by bounding appropriately the number of steps required for the coupling. Let be the event that the number of steps in the coupling is more than . Since the number of steps of the coupling is upper bounded by the number of vertices of , using (11.13) and Markov’s inequality we get that
[TABLE]
We have that
[TABLE]
The above implies that (11.11) is indeed true. The lemma follows.
11.3. Proof of Lemma 11.6
Clearly, Lemma 11.3 implies that we have if and only if . To see this note the following: Assume that there is such that . Then Lemma 11.3 implies that since and , then we also have , which is false.
For proving Lemma 11.6, it remains to show that if and only if . First we focus on showing that for we have
[TABLE]
For even integer consider the factor tree which contains levels of variable nodes and it is rooted at . The configuration is called “-mixing", for some , if it holds that
[TABLE]
Let be the set of all configurations which are -mixing for . The above quantity expresses the correlation between the configuration of the vertices at distance and the root (set to vertex correlation).
Eq. (11.21) follows by showing the following result.
Lemma 11.9**.**
For and every there exists such that for any even we have
[TABLE]
Proof.
We shift our attention to considering the teacher-student pair . In light of Corollary 4.9, it suffices to show the following: For and every there exists such that for any we have
[TABLE]
In light of Lemma 11.4, for (11.23) it suffices to show the following result: For any and any there exists such that
[TABLE]
Clearly the above follows from the definition of . ∎
From Lemma 11.9 we get (11.21) by working as follows: Let
[TABLE]
Furthermore, for any , integer , for , for any vertex and distributed as in Gibbs measure, let be the event that . Lemma 11.9 implies that for , for every there exists such that for any the following holds:
[TABLE]
Noting that , we get that (11.21) is indeed true.
We conclude the proof of the Lemma 11.6 by showing that for we have
[TABLE]
The proof of (11.24) is by contradiction. Assume that there exists such that , this would entail that (11.22) is true. Then, reversing the arguments from the proof of Lemma 11.9, and combining them Corollary 4.9, we get that for any there exists such that for any we have
[TABLE]
The above implies that . Clearly we get a contradiction since we have shown in Lemma 11.3 that for every we have .
Acknowledgment. We thank Will Perkins, Guilhem Semerjian and Nick Wormald for helpful discussions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. Abbe: Community detection and stochastic block models: recent developments. ar Xiv:1703.10146 (2017).
- 2[2] E. Abbe, A. Montanari: Conditional random fields, planted constraint satisfaction and entropy concentration. Theory of Computing 11 (2015) 413–443.
- 3[3] E. Abbe, C. Sandon: Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap. ar Xiv:1512.09080 (2015).
- 4[4] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802.
- 5[5] D. Achlioptas, H. Hassani, N. Macris, R. Urbanke: Bounds for random constraint satisfaction problems via spatial coupling. Proc. 27th SODA (2016) 469–479.
- 6[6] D. Achlioptas, C. Moore: Random k 𝑘 k -SAT: two moments suffice to cross a sharp threshold. SIAM Journal on Computing 36 (2006) 740–762.
- 7[7] D. Achlioptas, C. Moore: On the 2-colorability of random hypergraphs. Proc. 6th RANDOM (2002) 78–90.
- 8[8] D. Achlioptas, A. Naor: The two possible values of the chromatic number of a random graph. Annals of Mathematics 162 (2005) 1333–1349.
