Overlap synchronisation in multipartite random energy models

Giuseppe Genovese; Daniele Tantari

arXiv:1705.03939·cond-mat.dis-nn·November 22, 2017

Overlap synchronisation in multipartite random energy models

Giuseppe Genovese, Daniele Tantari

PDF

TL;DR

This paper investigates a multipartite random energy model composed of coupled GREMs, establishing how the overlaps between different parts synchronize based on the overlaps of individual GREMs, illustrating a fundamental phenomenon called overlap synchronisation.

Contribution

It provides the first explicit characterization of overlap synchronisation in multipartite random energy models with coupled GREMs.

Findings

01

Derived the joint law of overlaps in multipartite models

02

Established the phenomenon of overlap synchronisation

03

Simplified understanding of coupled GREM interactions

Abstract

In a multipartite random energy model, made of a number of coupled GREMs, we determine the joint law of the overlaps in terms of the ones of the single GREMs. This provides the simplest example of the so-called overlap synchronisation.

Equations89

\ell_{\kappa,j}:=\mu_{(\kappa,1)}\dots\mu_{(\kappa,j)}\,\quad\mbox{(note $\ell_{\kappa,1}=\mu_{(\kappa,1)}$ and $\ell_{\kappa,K_{\kappa}}=\mu_{(\kappa)}$)}

\ell_{\kappa,j}:=\mu_{(\kappa,1)}\dots\mu_{(\kappa,j)}\,\quad\mbox{(note $\ell_{\kappa,1}=\mu_{(\kappa,1)}$ and $\ell_{\kappa,K_{\kappa}}=\mu_{(\kappa)}$)}

E [J_{ℓ_{κ, j}}^{(κ, j)} J_{ℓ_{κ, j}^{'}}^{(κ, j)}]

E [J_{ℓ_{κ, j}}^{(κ, j)} J_{ℓ_{κ, j}^{'}}^{(κ, j)}]

E [J_{ℓ_{κ_{1}, j_{1}} ℓ_{κ_{2}, j_{2}}}^{(κ_{1}, j_{1}) (κ_{2}, j_{2})} J_{ℓ_{κ_{1}, j_{1}}^{'} ℓ_{κ_{2}, j_{2}}^{'}}^{(κ_{1}, j_{1}) (κ_{2}, j_{2})}]

H_{N} (σ) := - \frac{N}{2} κ = 1 \sum M α^{(κ)} j = 1 \sum K_{κ} a_{j}^{(κ)} J_{ℓ_{κ, j}}^{(κ, j)} + 2 α^{(1)} \dots α^{(M)} (κ_{1}, κ_{2}) \sum j_{1} = 1 \sum K_{κ_{1}} j_{2} = 1 \sum K_{κ_{2}} c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})} J_{ℓ_{κ_{1}, j_{1}} ℓ_{κ_{2}, j_{2}}}^{(κ_{1}, j_{1}) (κ_{2}, j_{2})},

H_{N} (σ) := - \frac{N}{2} κ = 1 \sum M α^{(κ)} j = 1 \sum K_{κ} a_{j}^{(κ)} J_{ℓ_{κ, j}}^{(κ, j)} + 2 α^{(1)} \dots α^{(M)} (κ_{1}, κ_{2}) \sum j_{1} = 1 \sum K_{κ_{1}} j_{2} = 1 \sum K_{κ_{2}} c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})} J_{ℓ_{κ_{1}, j_{1}} ℓ_{κ_{2}, j_{2}}}^{(κ_{1}, j_{1}) (κ_{2}, j_{2})},

j \sum K_{κ} a_{j}^{(κ)} = j_{1} = 1 \sum K_{κ_{1}} j_{2} = 1 \sum K_{κ_{2}} c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})} = 1, \forall κ, κ_{1}, κ_{2} \in {1, \dots, M} .

j \sum K_{κ} a_{j}^{(κ)} = j_{1} = 1 \sum K_{κ_{1}} j_{2} = 1 \sum K_{κ_{2}} c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})} = 1, \forall κ, κ_{1}, κ_{2} \in {1, \dots, M} .

τ_{κ} = τ_{κ} (σ, σ^{'}) := in f {j \geq 0 : μ_{(κ, j + 1)} \neq = μ^{'}_{(κ, j + 1)}}, κ = 1 \dots, M,

τ_{κ} = τ_{κ} (σ, σ^{'}) := in f {j \geq 0 : μ_{(κ, j + 1)} \neq = μ^{'}_{(κ, j + 1)}}, κ = 1 \dots, M,

E [H_{N} (σ) H_{N} (σ^{'})] = \frac{N}{2} κ = 1 \sum M α^{(κ)}^{2} j = 1 \sum τ_{κ} a_{j}^{(κ)}^{2} + 2 α^{(1)} \dots α^{(M)} j_{1}, j_{2} = 1 \sum m i n (τ_{κ_{1}}, τ_{κ_{2}}) c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})}^{2} .

E [H_{N} (σ) H_{N} (σ^{'})] = \frac{N}{2} κ = 1 \sum M α^{(κ)}^{2} j = 1 \sum τ_{κ} a_{j}^{(κ)}^{2} + 2 α^{(1)} \dots α^{(M)} j_{1}, j_{2} = 1 \sum m i n (τ_{κ_{1}}, τ_{κ_{2}}) c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})}^{2} .

0 = q_{0}^{(κ)} < q_{1}^{(κ)} < \dots < q_{K_{κ}}^{(κ)} = 1, κ \in {1, \dots, M}

0 = q_{0}^{(κ)} < q_{1}^{(κ)} < \dots < q_{K_{κ}}^{(κ)} = 1, κ \in {1, \dots, M}

A_{N} (β) := \frac{1}{N} lo g σ \sum e^{- β H_{N} (σ)}, A (β) := N lim A_{N} (β) .

A_{N} (β) := \frac{1}{N} lo g σ \sum e^{- β H_{N} (σ)}, A (β) := N lim A_{N} (β) .

(q_{1}, \dots, q_{M}) = d (x_{1}^{- 1} (υ), \dots, x_{M}^{- 1} (υ)) .

(q_{1}, \dots, q_{M}) = d (x_{1}^{- 1} (υ), \dots, x_{M}^{- 1} (υ)) .

(q_{1}, \dots, q_{M}) = d (x_{1}^{- 1} \circ x_{t o t} (q_{t o t}), \dots, x_{M}^{- 1} \circ x_{t o t} (q_{t o t})) .

(q_{1}, \dots, q_{M}) = d (x_{1}^{- 1} \circ x_{t o t} (q_{t o t}), \dots, x_{M}^{- 1} \circ x_{t o t} (q_{t o t})) .

A_{R E M} (β) := 1_{{β \leq β_{c}}} lo g 2 (1 + \frac{β ^{2}}{β _{c}^{2}}) + 1_{{β > β_{c}}} 2 lo g 2 \frac{β}{β _{c}} .

A_{R E M} (β) := 1_{{β \leq β_{c}}} lo g 2 (1 + \frac{β ^{2}}{β _{c}^{2}}) + 1_{{β > β_{c}}} 2 lo g 2 \frac{β}{β _{c}} .

H_{N} (σ) := - \frac{N}{2} [α a J_{μ_{(1)}}^{(1, 1)} + (1 - α) b J_{μ_{(2)}}^{(2, 1)} + 2 α (1 - α) c J_{μ_{(1)} μ_{(2)}}^{(1, 1) (2, 1)}]

H_{N} (σ) := - \frac{N}{2} [α a J_{μ_{(1)}}^{(1, 1)} + (1 - α) b J_{μ_{(2)}}^{(2, 1)} + 2 α (1 - α) c J_{μ_{(1)} μ_{(2)}}^{(1, 1) (2, 1)}]

P_{N, β}^{(1)} (μ_{(1)}; β) := Z^{- 1} μ_{(2)} \sum e^{- β H_{N} (σ)} ⇀ P D (0, β_{1} / β),

P_{N, β}^{(1)} (μ_{(1)}; β) := Z^{- 1} μ_{(2)} \sum e^{- β H_{N} (σ)} ⇀ P D (0, β_{1} / β),

P_{N, β} (σ; β) ⇀ P D (0, β_{2} / β),

P_{N, β} (σ; β) ⇀ P D (0, β_{2} / β),

P_{β} (q_{1} \geq q, q_{2} \geq p) = min (P_{β} (q_{1} \geq q), P_{β} (q_{2} \geq p))

P_{β} (q_{1} \geq q, q_{2} \geq p) = min (P_{β} (q_{1} \geq q), P_{β} (q_{2} \geq p))

℘ := ℘ {ℓ_{1, 1}, \dots, ℓ_{1, K_{1}}, \dots, ℓ_{M, 1}, \dots, ℓ_{M, κ_{M}}} .

℘ := ℘ {ℓ_{1, 1}, \dots, ℓ_{1, K_{1}}, \dots, ℓ_{M, 1}, \dots, ℓ_{M, κ_{M}}} .

Γ_{n} \in ℘, Γ_{n} \subset Γ_{n + 1}, Γ_{0} = \emptyset, Γ_{K} = {ℓ_{1, 0}, \dots, ℓ_{1, K_{1}}, \dots, ℓ_{M, 1}, \dots, ℓ_{M, κ_{M}}} .

Γ_{n} \in ℘, Γ_{n} \subset Γ_{n + 1}, Γ_{0} = \emptyset, Γ_{K} = {ℓ_{1, 0}, \dots, ℓ_{1, K_{1}}, \dots, ℓ_{M, 1}, \dots, ℓ_{M, κ_{M}}} .

α_{n} := \frac{lo g _{2} ⋃ _{i, j : ℓ_{j}^{i} \in Γ_{n} ∖ Γ_{n - 1}} ℓ _{j}^{i}}{N},

α_{n} := \frac{lo g _{2} ⋃ _{i, j : ℓ_{j}^{i} \in Γ_{n} ∖ Γ_{n - 1}} ℓ _{j}^{i}}{N},

γ_{n}^{2} := κ = 1 \sum M α^{(κ)}^{2} j : ℓ_{κ, j} \in Γ_{n} ∖ Γ_{n - 1} \sum (a_{j}^{(κ)})^{2} + : {ℓ_{κ_{1}, j_{1}}, ℓ_{κ_{2}, j_{2}}} \in Γ_{n} ∖ Γ_{n - 1} (κ_{1}, κ_{2}), (j_{1}, j_{2}) : \sum 2 α_{κ_{1}} α_{κ_{2}} (c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})})^{2} .

γ_{n}^{2} := κ = 1 \sum M α^{(κ)}^{2} j : ℓ_{κ, j} \in Γ_{n} ∖ Γ_{n - 1} \sum (a_{j}^{(κ)})^{2} + : {ℓ_{κ_{1}, j_{1}}, ℓ_{κ_{2}, j_{2}}} \in Γ_{n} ∖ Γ_{n - 1} (κ_{1}, κ_{2}), (j_{1}, j_{2}) : \sum 2 α_{κ_{1}} α_{κ_{2}} (c_{j_{1}, j_{2}}^{(κ_{1}, κ_{2})})^{2} .

H_{N} (σ) = - \frac{N}{2} [α j = 1 \sum K_{1} a_{j}^{(1)} J_{ℓ_{1, j}}^{(1, j)} + (1 - α) j = 1 \sum K_{2} a_{j}^{(2)} J_{ℓ_{2, j}}^{(2, j)} + 2 α (1 - α) j_{1} = 1 \sum K_{1} j_{2} = 1 \sum K_{2} c_{j_{1}, j_{2}} J_{ℓ_{1, j_{1}} ℓ_{2, j_{2}}}^{(1, j_{1}) (2, j_{2})}]

H_{N} (σ) = - \frac{N}{2} [α j = 1 \sum K_{1} a_{j}^{(1)} J_{ℓ_{1, j}}^{(1, j)} + (1 - α) j = 1 \sum K_{2} a_{j}^{(2)} J_{ℓ_{2, j}}^{(2, j)} + 2 α (1 - α) j_{1} = 1 \sum K_{1} j_{2} = 1 \sum K_{2} c_{j_{1}, j_{2}} J_{ℓ_{1, j_{1}} ℓ_{2, j_{2}}}^{(1, j_{1}) (2, j_{2})}]

H_{n}

H_{n}

H_{N} (σ) = n = 1 \sum K H_{n},

H_{N} (σ) = n = 1 \sum K H_{n},

Z_{N} (β) = {Γ_{1}} \sum e^{- β H_{1}} {Γ_{2} / Γ_{1}} \sum e^{- β H_{2}} \dots {Γ_{n} / Γ_{n - 1}} \sum e^{- β H_{n}} .

Z_{N} (β) = {Γ_{1}} \sum e^{- β H_{1}} {Γ_{2} / Γ_{1}} \sum e^{- β H_{2}} \dots {Γ_{n} / Γ_{n - 1}} \sum e^{- β H_{n}} .

Z_{N} (β) ≃ {Γ_{1}} \sum e^{- β H_{1}} {Σ/ Γ_{1}} \sum e^{- β (H - H_{1})}

Z_{N} (β) ≃ {Γ_{1}} \sum e^{- β H_{1}} {Σ/ Γ_{1}} \sum e^{- β (H - H_{1})}

P_{N, β}^{(1)} (Γ_{1}; β) := Z_{1}^{- 1} Σ/ Γ_{1} \sum e^{- β H (σ)} ⇀ P D (0, β_{1} / β),

P_{N, β}^{(1)} (Γ_{1}; β) := Z_{1}^{- 1} Σ/ Γ_{1} \sum e^{- β H (σ)} ⇀ P D (0, β_{1} / β),

Z_{N} (β) ≃ {Γ_{1}} \sum e^{- β H_{1}} {Γ_{2} / Γ_{1}} \sum e^{- β H_{2}} {Σ/ Γ_{2}} \sum e^{- β (H - H_{1} - H_{2})} .

Z_{N} (β) ≃ {Γ_{1}} \sum e^{- β H_{1}} {Γ_{2} / Γ_{1}} \sum e^{- β H_{2}} {Σ/ Γ_{2}} \sum e^{- β (H - H_{1} - H_{2})} .

P_{N, β}^{(2)} (Γ_{2}; β) := Z_{2}^{- 1} Σ/ Γ_{2} \sum e^{- β H (σ)} ⇀ P D (0, β_{2} / β),

P_{N, β}^{(2)} (Γ_{2}; β) := Z_{2}^{- 1} Σ/ Γ_{2} \sum e^{- β H (σ)} ⇀ P D (0, β_{2} / β),

A_{GR E M} (Γ; β) := n = 1 \sum K 1_{{β \leq β_{n}}} α_{n} lo g 2 (1 + \frac{β ^{2}}{β _{n}^{2}}) + 1_{{β > β_{n}}} 2 α_{n} lo g 2 \frac{β}{β _{n}} .

A_{GR E M} (Γ; β) := n = 1 \sum K 1_{{β \leq β_{n}}} α_{n} lo g 2 (1 + \frac{β ^{2}}{β _{n}^{2}}) + 1_{{β > β_{n}}} 2 α_{n} lo g 2 \frac{β}{β _{n}} .

N lim P_{N, β} (dist (σ, σ^{'}) \leq max {dist (σ, σ^{''}), dist (σ^{'}, σ^{''})}) = 1, \forall β > β^{*} .

N lim P_{N, β} (dist (σ, σ^{'}) \leq max {dist (σ, σ^{''}), dist (σ^{'}, σ^{''})}) = 1, \forall β > β^{*} .

P_{N, β} (q_{1} \geq q_{j}^{(1)})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Overlap synchronisation in multipartite random energy models

Giuseppe Genovese

Giuseppe Genovese: Institut für Mathematik, Universität Zürich, CH-8057 Zürich, Switzerland.

[email protected]

and

Daniele Tantari

Daniele Tantari: Centro Ennio de Giorgi, Scuola Normale Superiore, Piazza dei Cavalieri 3, I-56100 Pisa (Italy).

[email protected]

Abstract.

In a multipartite random energy model, made of a number of coupled GREMs, we determine the joint law of the overlaps in terms of the ones of the single GREMs. This provides the simplest example of the so-called overlap synchronisation.

MSC: 82B44, 60G55, 60K35.

1. Introduction

The overlap synchronisation phenomenon was recently introduced by Panchenko in [1] for multipartite spin glasses [2]. The study of such systems is of primary interest, because of the applications in neural network theory and statistical inference [3, 4]: e.g. the Hopfield model, the restricted Boltzmann machine and the perceptron are examples of bipartite spin glasses. The lack of convexity prevents to apply directly to multipartite spin-glasses some useful techniques developed for the Sherrington-Kirpatrick model (*i.e. *interpolation bounds [5]), calling for new ideas.

In this note we investigate a multipartite random energy model, originally studied for the bipartite case in [6], obtained coupling each level of $M$ distinct generalised random energy models (GREMs). We show the joint law of the overlaps to have a direct expression in terms of the ones of the single GREMs. This provides a simple example of overlap synchronisation.

The model is defined as follows. Let $N,M\in\mathbb{N}$ , $\kappa\in\{1,\dots,M\}$ and $N_{\kappa}\in\mathbb{N}$ with $\sum_{\kappa}N_{\kappa}=N$ , $\alpha^{(\kappa)}:=N_{\kappa}/N$ . For each configuration $\sigma\in\Sigma_{N}:=\{1,\dots,2^{N}\}$ we can identify $\sigma=(\mu_{(1)},\dots,\mu_{(M)})$ , $\mu_{(\kappa)}\in\{1,\ldots,2^{N_{\kappa}}\}$ . We divide each part respectively into $K_{1},\dots K_{M}$ hierarchical levels. For each level $j$ of the hierarchy, each group of configurations is divided in $2^{N_{\kappa,j}}$ further subgroups indexed by $\mu_{(\kappa,j)}$ , with of course $\sum_{j}N_{\kappa,j}=N_{\kappa}$ and $\varsigma_{\kappa,j}:=N_{\kappa,j}/N_{\kappa}$ , $j\in\{1,\dots,K_{\kappa}\}$ . Each configuration can be thought of as a $M$ -ple $\sigma=(\mu_{(1)},\dots,\mu_{(M)})$ or as a $\left(\prod_{\kappa}K_{\kappa}\right)$ -ple $\sigma=(\mu_{(1,1)}\dots\mu_{(1,K_{1})},\dots,\mu_{(M,1)},\dots\mu_{(M,K_{M})})$ . This multipartite setting brings a somewhat heavy notation. To lighten it a little we let

[TABLE]

label the configurations in the $j$ -th level of the $\kappa$ -th tree. With a slight abuse of notation we will denote with the same symbol also the set of such configurations (the correct meaning will be always clear from the context). We attach to each couple of levels Gaussian centred r.vs $J^{(\kappa,j)}_{\ell_{\kappa,j}}$ , and $J^{(\kappa_{1},j_{1})(\kappa_{2},j_{2})}_{\ell_{\kappa_{1},j_{1}}\ell_{\kappa_{2},j_{2}}}$ with

[TABLE]

The levels interact via the following Hamiltonian

[TABLE]

with

[TABLE]

We can introduce $M$ different partial overlaps between two different configurations $\sigma\neq\sigma^{\prime}$ as

[TABLE]

and $\tau_{\kappa}=K_{\kappa}$ if $\sigma=\sigma^{\prime}$ . Then a direct computation gives

[TABLE]

It is somehow convenient to set the overlaps in $[0,1]$ : we introduce $M$ sequences of numbers in $[0,1]$

[TABLE]

and put $q_{\kappa}=q_{\kappa}(\sigma,\sigma^{\prime}):=q^{(\kappa)}_{\tau_{\kappa}}$ . We also define the total overlap to be $q_{tot}:=\sum_{\kappa=1}^{M}\alpha_{\kappa}q_{\kappa}$ . As customary for $\beta>0$ ( $-\frac{1}{\beta}$ ) the free energy is given by

[TABLE]

Of course as a consequence of Talagrand inequality $A_{N}(\beta)$ is self-averaging as $N\to\infty$ , so we can always take the expectation w.r.t. the disorder, when needed. Here and further we denote by $P_{N,\beta}$ the Gibbs distribution associated to the model and by $\left\langle\cdot\right\rangle_{N,\beta}$ the quenched average of observables (we drop the subscript $N$ in the thermodynamic limit) and $x_{\kappa}(q):=P_{\beta}\left(q_{\kappa}\leq q\right)$ , $x_{tot}(q):=P_{\beta}\left(q_{tot}\leq q\right)$ .

Our main result is

Theorem.

Let $\upsilon$ be a random variable uniformly distributed in $[0,1]$ . Then

[TABLE]

This result can be given also in terms of the total overlap (as in [1]):

[TABLE]

A larger class of non-hierarchical random energy models including the one under consideration was studied by Bolthausen and Kistler in [7, 8]. We shall make use of some crucial ideas from those two papers, in which the so-called Parisi picture is proved. A more precise formulation of the results in [7, 8] will be given below.

2. More on the Model

Prior to give the proof, it is convenient to discuss a little more the model. What follows is in a good part heuristics and rigorous proofs can be found in [7, 8].

For $M=1$ and $K_{1}=1$ , we simply recover the usual REM. We shortly summarise some basic features of this well-known model. The model has a phase transition at $\beta_{c}:=2\sqrt{\log 2}$ , so that $x(q)=1$ for $\beta\leq\beta_{c}$ and $x(q)=\beta_{c}/\beta$ otherwise. The free energy reads

[TABLE]

Next consider for simplicity the bipartite model ( $M=2$ ) with $K_{1}=K_{2}=1$ , defined by the Hamiltonian (we set $a_{1}^{(1)}=a$ , $a_{1}^{(2)}=b$ and $\alpha^{(1)}=\alpha$ )

[TABLE]

(this was analysed also in [9] by a slightly different perspective). If we assume for definiteness $\alpha a^{2}>(1-\alpha)b^{2}$ , there are two possibilities: either $\alpha a^{2}\leq(1-\alpha)b^{2}+2\alpha c^{2}$ or $\alpha a^{2}>(1-\alpha)b^{2}+2\alpha c^{2}$ . The first case is less interesting and we focus on the second one. At very high temperature everything is ergodic and the free energy coincides with the annealed one. As $\beta>\beta_{1}:=2\sqrt{\log 2}/a\sqrt{\alpha}$ , the $\mu_{(1)}$ -subset freezes, *i.e. *its relative entropy goes to zero (as in the first transition in a GREM [10]) and one can show that

[TABLE]

where $Z$ is a normalisation factor and $PD(0,x)$ denotes the law of a normalised Poisson point process with intensity $\rho(t)=xt^{-x-1}$ or Poisson-Dirichlet distribution. In this regime for any $q,p>0$ $x_{1}(q)=\beta_{1}/\beta$ , while $P_{\beta}(q_{2}\geq p)=P_{\beta}(q_{1}\geq q,q_{2}\geq p)=0$ . The free energy is a convex combination (with $\alpha$ ) of two REMs, one on the $\mu_{(1)}$ subset at low temperature and the other on the rest of the system at high temperature. As $\beta$ increases further, the total entropy vanishes for $\beta>\beta_{2}:=2\sqrt{\log 2}/\sqrt{(1-\alpha)b^{2}+2\alpha c^{2}}$ and the whole Gibbs measure converges toward a Poisson-Dirichlet process

[TABLE]

The free energy is the convex combination of two REMs at low temperature. $x_{1}(q)$ is unchanged, but $1-x_{2}(p)=P_{\beta}(q_{1}\geq q,q_{2}\geq p)=1-\beta_{2}/\beta$ . Note

[TABLE]

for any $\beta$ . We remark that, since the overlaps take value in $\{0,1\}$ in this simple case, $P_{\beta}(q_{1}\geq q)=P_{\beta}(q_{1}=1)$ and $P_{\beta}(q_{2}\geq p)=P_{\beta}(q_{2}=1)$ (as $q,p>0$ ). Therefore the first system starts freezing at higher temperature: if the second system is frozen, then also the first one is so (as in a two-level GREM [10]). The whole picture is summarised as follows

[TABLE]

Therefore, albeit not inbuilt in the model, a GREM-like hierarchical structure naturally emerges. A way to visualise that in the general model defined by the Hamiltonian (1.1) is as follows. Recall that the $\ell_{\kappa,j}$ , $\kappa\in\{1,\ldots,M\}$ , $j\in\{0,\dots,K_{\kappa}\}$ , denote the configurations up to the $j$ -th level of the $\kappa$ -th GREM. Then the phase space is naturally coarse-grained by the class of sets $\{\ell_{\kappa_{1},j_{1}},\ell_{\kappa_{2},j_{2}}\}^{j_{1}=1,\dots,K_{\kappa_{1}}}_{j_{2}=1,\dots,K_{\kappa_{2}}}$ . We think of each level now as an atom and we can consider the power set

[TABLE]

According to [7, 8] a chain $\Gamma$ is defined to be an increasing (finite) sequence of $K\leq\sum_{\kappa}K_{\kappa}$ sets in $\wp$ $\Gamma=\{\Gamma_{n}\}_{n=0,\dots,K}$ so that

[TABLE]

To each $\Gamma$ we associate two sequences $\{\alpha_{n}\}_{n=1,\dots,K}$ and $\gamma:=\{\gamma_{n}\}_{n=1,\dots,K}$ . The $\alpha_{n}$ represent the relative sizes of the $\Gamma_{n}$

[TABLE]

easily computed from the numbers $\alpha^{(\kappa)}$ and $\varsigma_{\kappa,j}$ ; the $\gamma_{n}$ are variances defined by

[TABLE]

From $\alpha_{n}$ and $\gamma_{n}$ we can build another sequence of critical inverse temperatures $\{\beta_{n}\}_{n=1,\dots,K}$ , $\beta_{n}:=\sqrt{\alpha_{n}\log 2}\gamma_{n}^{-1}$ . In general $\{\beta_{n}\}_{n=1,\dots,K}$ is not monotone, but we can conveniently confine our attention to those chains for which $\beta_{1}\leq\beta_{2}\leq\ldots\leq\beta_{K}$ . We denote by ${\mathcal{T}}$ the set of such chains.

To fix the ideas, let us consider again a bipartite REM with $K_{1},K_{2}$ levels. The Hamiltonian reads as

[TABLE]

For a given $\Gamma\in\mathcal{T}$ of length $K$ , we set for $n=1,\ldots,K$

[TABLE]

so that we can decompose the Hamiltonian (2.3) according to

[TABLE]

and the partition function can be written as

[TABLE]

Now we see the following scenario. At $\beta$ small enough the annealed approximation holds and the overlaps are set to zero. Then $\beta$ increases, $\beta>\beta_{1}$ , and the configurations in $\Gamma_{1}$ freeze. In fact $H_{2}$ depends on configurations in $\Gamma_{2}/\Gamma_{1}$ , i.e. $H_{1}$ and all the other addenda in the r.h.s. of (2.4) become independent as $N\to\infty$ . Thus the partition function asymptotically factorises

[TABLE]

as two independent REMs: the first one on the space of configurations $\Gamma_{1}$ is at low temperature, the second one on the remaining configuration space is at high temperature (with the right variance $\sqrt{\sum_{n\geq 2}\gamma^{2}_{n}}$ ). The free energy is a convex combination w.r.t. $\alpha_{1}$ (*i.e. *the relative size of $\Gamma_{1}$ ) of these two REMs. As in the previous example, we have convergence of the marginalised Gibbs measure to a Poisson-Dirichlet distribution

[TABLE]

with $Z_{1}$ an opportune normalisation. Since $H_{1}$ and $H_{2}$ remain independent for all $\beta>\beta_{1}$ we can iterate this procedure: for instance as $\beta>\beta_{2}$ also $\Gamma_{2}$ freezes and $H_{2}$ becomes asymptotically independent on $H_{3}$ ; thus the partition function is factorised as

[TABLE]

These are three independent REMs on configurations $\Gamma_{1}$ , $\Gamma_{2}/\Gamma_{1}$ and $\Sigma/\Gamma_{2}$ , the associated free energy is given by a convex combination of the low-temperature free energies of the first two REMs and the high-temperature free energy of the third one and

[TABLE]

$Z_{2}$ is a normalisation factor. Going on this way we recover the free energy and the Gibbs measure as a GREM-like structure along the chain. At zero temperature the free energy of the model is just the convex combination of those of REMs at low temperature, each defined on an element of the chain. This construction can be made for every chain in $\mathcal{T}$ . Of course for fixed $\beta$ , the more REMs are at low temperature, the higher is the free energy. According to this criterion one can select the chain along which the free energy is maximal. By the above construction it should be clear that such a chain, here denoted by $\Gamma^{*}$ , is unique.

The results of [7, 8] (for the case of our interest) can be precisely formulated as follows. Here $\Gamma\in\mathcal{T}$ and $A_{GREM}(\Gamma;\beta)$ denotes the GREM pressure computed along $\Gamma$ :

[TABLE]

Theorem (Bolthausen and Kistler).

The following holds:

i)

$\lim_{N}A_{N}(\beta)=\lim_{N}\mathbb{E}[A_{N}(\beta)]=A(\beta)=\min_{\gamma\in\mathcal{T}}\left(A_{GREM}(\gamma;\beta)\right)\,.$ 2. ii)

Ultrametricity: there is a $\beta^{*}$ such that for each triad of configurations $(\sigma,\sigma^{\prime},\sigma^{\prime\prime})\in\Sigma^{3}$

[TABLE]

For a thorough discussion of point $ii)$ we refer to the original work, but is worth mentioning it comes from Theorem 3 in [8], on which we roughly report: for $\beta>\beta_{n}$ $P_{N,\beta}^{(n)}\rightharpoonup PD(0,\beta_{n}/\beta)$ and the overlap converges in distribution to the Bolthausen-Sznitman coalescence along the optimal chain $\Gamma^{*}$ (the $\beta_{n}$ ’s being associated to $\Gamma^{*}$ ); moreover the overlap and the Gibbs measure are asymptotically independent.

3. Proof

Now we are ready to give the proof of our statement. For simplicity we keep working mostly in the bipartite case. We convey to fix the optimal chain $\Gamma^{*}$ once for all. The sequences $\{\alpha_{n}\}$ , $\{\gamma_{n}\}$ and $\{\beta_{n}\}$ will be always referred to $\Gamma^{*}$ .

A direct computation from (1.1) and (1.3) yields

[TABLE]

On the other hand the limiting free energy is a convex combination of REM ones along $\Gamma^{*}$ , thus its derivatives can be explicitly computed. We set

[TABLE]

Then

[TABLE]

As $N\to\infty$ the two expressions for the derivatives must be equal (the exchange of the limit and the derivative can be justified for instance using Theorem 3 in [8] and the concavity of the free energy). Hence if $n_{1}(j_{1})=n_{2}(j_{2})=\bar{n}$

[TABLE]

Otherwise we have

[TABLE]

and

[TABLE]

As $1-\beta_{n}/\beta$ is decreasing in $n$ , formulas (3.4) and (3.5), (3.6), (3.7) establish directly

[TABLE]

Let now $\upsilon\sim U(0,1)$ . We have

[TABLE]

from which (1.4) is readily deduced for $M=2$ .

The generalisation to the multipartite case is immediate. We have for every $(\kappa_{1},\kappa_{2})\in\{1,\ldots,M\}^{2}$

[TABLE]

whence, proceeding as above, $(q_{\kappa_{1}},q_{\kappa_{2}})\overset{d}{=}(x_{1}^{-1}(\upsilon),x_{2}^{-1}(\upsilon))$ , $\upsilon\sim U(0,1)$ . Therefore we have mutual synchronisation for every couple $(\kappa_{1},\kappa_{2})$ , which is enough to obtain (1.4) for any $M$ . Moreover a simple computation from (3.9) also gives

[TABLE]

for any $\kappa=1,\ldots,M$ , thus for $\upsilon\sim U(0,1)$

[TABLE]

where we note $x^{-1}_{tot}(z)=\sum_{\kappa=1}^{M}\alpha_{\kappa}x_{\kappa}^{-1}(z)$ . Then (1.5) easily follows.

Acknowledgements This research was supported through the programme Research in Pairs by the Mathematisches Forschungsinstitut Oberwolfach in November 2016. G.G. is supported by the NCCR SwissMAP, D.T. is supported by GNFM-Indam. We thank E. Bolthausen for some valuable discussions and S. Franz for a useful correspondence on the paper [6].

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Panchenko, The Free Energy in a Multi-Species Sherrington-Kirkpatrick Model , Ann. Prob. 43, 3494-3513 (2015).
2[2] A. Barra, P. Contucci, E. Mingione, D. Tantari, Multi-Species Mean Field Spin Glasses. Rigorous Results , Ann. H. Poincaré 16 , 691-708, (2015).
3[3] M. Mezard, Mean-field message-passing equations in the Hopfield model and its generalizations Phys. Rev. E 95 , 022117 (2017)
4[4] A.Barra, G.Genovese, P.Sollich and D.Tantari, Phase transitions in Restricted Boltzmann Machines with generic priors , preprint ar Xiv:1612.03132, (2016).
5[5] F. Guerra, An Introduction to Mean Field Spin Glass Theory: Methods and Results , A. Bovier et al. eds, Les Houches, Session LXXXIII, (2005).
6[6] S. Franz, G. Parisi, M. A. Virasoro, Ultrametricity in an Inhomogeneous Simplest Spin Glass Model , Europhys. Lett. 17, 5-9, (1992).
7[7] E. Bolthausen, N. Kistler, On a nonhierarchical version of the Generalized Random Energy Model , Ann. Appl. Prob. 16, 1-14, (2006).
8[8] E. Bolthausen, N. Kistler, On a nonhierarchical version of the Generalized Random Energy Model. II. Ultrametricity , Stoch. Proc. Appl. 119, 2357-2386, (2009).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Overlap synchronisation in multipartite random energy models

Abstract.

1. Introduction

Theorem**.**

2. More on the Model

Theorem** (Bolthausen and Kistler).**

3. Proof

Theorem.

Theorem (Bolthausen and Kistler).