Average Gromov hyperbolicity and the Parisi ansatz

Sourav Chatterjee; Leila Sloman

arXiv:1907.03203·math.PR·September 29, 2020

Average Gromov hyperbolicity and the Parisi ansatz

Sourav Chatterjee, Leila Sloman

PDF

TL;DR

This paper introduces an average-case version of Gromov hyperbolicity to determine when a space resembles a tree, and applies this to construct hierarchically organized states in spin glass models following the Parisi ultrametricity ansatz.

Contribution

It defines an average hyperbolicity measure, proves that small average hyperbolicity implies approximate tree embedding, and applies this to spin glass models.

Findings

01

Average hyperbolicity is bounded above by Gromov hyperbolicity.

02

Small average hyperbolicity implies approximate tree embedding.

03

Constructs hierarchically organized pure states in spin glasses.

Abstract

Gromov hyperbolicity of a metric space measures the distance of the space from a perfect tree-like structure. The measure has a "worst-case" aspect to it, in the sense that it detects a region in the space which sees the maximum deviation from tree-like structure. In this article we introduce an "average-case" version of Gromov hyperbolicity, which detects whether the "most of the space", with respect to a given probability measure, looks like a tree. The main result of the paper is that if this average hyperbolicity is small, then the space can be approximately embedded in a tree. The proof uses a weighted version of Szemeredi's regularity lemma from graph theory. The result applies to Gromov hyperbolic spaces as well, since average hyperbolicity is bounded above by Gromov hyperbolicity. As an application, we give a construction of hierarchically organized pure states in any model of a…

Equations389

(x, y)_{z} := \frac{1}{2} (d (x, z) + d (y, z) - d (x, y)) .

(x, y)_{z} := \frac{1}{2} (d (x, z) + d (y, z) - d (x, y)) .

(x, y)_{w} \geq min {(x, z)_{w}, (y, z)_{w}} - δ .

(x, y)_{w} \geq min {(x, z)_{w}, (y, z)_{w}} - δ .

Hyp (S, F, P, s) := E (min {s (X, Z), s (Y, Z)} - s (X, Y))_{+} \leq δ,

Hyp (S, F, P, s) := E (min {s (X, Z), s (Y, Z)} - s (X, Y))_{+} \leq δ,

Tree (S, F, P, s)

Tree (S, F, P, s)

\iint ∣ (x, y)_{w} - α (x, y)_{r} ∣ d P (x) d P (y) \leq ϵ (δ, D),

\iint ∣ (x, y)_{w} - α (x, y)_{r} ∣ d P (x) d P (y) \leq ϵ (δ, D),

R_{1, 2} := \frac{1}{n} i = 1 \sum n σ_{i}^{1} σ_{i}^{2} \in [- 1, 1] .

R_{1, 2} := \frac{1}{n} i = 1 \sum n σ_{i}^{1} σ_{i}^{2} \in [- 1, 1] .

n \to \infty lim E ⟨ \mathbbm 1_{{R_{1, 2} \geq m i n {R_{1, 3}, R_{2, 3}} - ϵ}} ⟩ = 1,

n \to \infty lim E ⟨ \mathbbm 1_{{R_{1, 2} \geq m i n {R_{1, 3}, R_{2, 3}} - ϵ}} ⟩ = 1,

n \to \infty lim E ⟨ \mathbbm 1_{{f (R_{1, 2}) \geq m i n {f (R_{1, 3}), f (R_{2, 3})} - ϵ}} ⟩ = 1

n \to \infty lim E ⟨ \mathbbm 1_{{f (R_{1, 2}) \geq m i n {f (R_{1, 3}), f (R_{2, 3})} - ϵ}} ⟩ = 1

⟨ ∣ f (R_{1, 2}) - q_{α} ∣ ⟩ \leq δ_{n},

⟨ ∣ f (R_{1, 2}) - q_{α} ∣ ⟩ \leq δ_{n},

d (U, V) : = \frac{μ ^{\otimes 2} (( x , y ) \in E : x \in U , y \in V )}{μ ( U ) μ ( V )} .

d (U, V) : = \frac{μ ^{\otimes 2} (( x , y ) \in E : x \in U , y \in V )}{μ ( U ) μ ( V )} .

∣ d (A, B) - d (U, V) ∣ \leq ϵ .

∣ d (A, B) - d (U, V) ∣ \leq ϵ .

μ^{*} := x \in S max μ (x) .

μ^{*} := x \in S max μ (x) .

P (A) := \frac{μ ( A )}{μ ( S )}, A \subset S .

P (A) := \frac{μ ( A )}{μ ( S )}, A \subset S .

P^{*} := x \in S max P (x) = \frac{μ ^{*}}{μ ( S )} .

P^{*} := x \in S max P (x) = \frac{μ ^{*}}{μ ( S )} .

H = i = 1 \sum N λ_{i} u_{i} u_{i}^{T},

H = i = 1 \sum N λ_{i} u_{i} u_{i}^{T},

∣ λ_{1} ∣ \geq ∣ λ_{2} ∣ \geq \dots \geq ∣ λ_{N} ∣.

∣ λ_{1} ∣ \geq ∣ λ_{2} ∣ \geq \dots \geq ∣ λ_{N} ∣.

z_{k} = k times F \circ F \circ \dots \circ F (1) .

z_{k} = k times F \circ F \circ \dots \circ F (1) .

tr (H^{2}) = i = 1 \sum N λ_{i}^{2} = 2∣ E_{N} ∣ \leq N^{2},

tr (H^{2}) = i = 1 \sum N λ_{i}^{2} = 2∣ E_{N} ∣ \leq N^{2},

z_{k} \leq i < z_{k + 1} \sum λ_{i}^{2} \leq \frac{ϵ ^{5} N ^{2}}{128} .

z_{k} \leq i < z_{k + 1} \sum λ_{i}^{2} \leq \frac{ϵ ^{5} N ^{2}}{128} .

J \leq i < F (J) \sum λ_{i}^{2} \leq \frac{ϵ ^{5} N ^{2}}{128} .

J \leq i < F (J) \sum λ_{i}^{2} \leq \frac{ϵ ^{5} N ^{2}}{128} .

H_{1} = i < J \sum λ_{i} u_{i} u_{i}^{T}, H_{2} = J \leq i < F (J) \sum λ_{i} u_{i} u_{i}^{T}, H_{3} = i \geq F (J) \sum λ_{i} u_{i} u_{i}^{T} .

H_{1} = i < J \sum λ_{i} u_{i} u_{i}^{T}, H_{2} = J \leq i < F (J) \sum λ_{i} u_{i} u_{i}^{T}, H_{3} = i \geq F (J) \sum λ_{i} u_{i} u_{i}^{T} .

E_{N} (A, B) = \mathbbm 1_{A}^{T} H_{1} \mathbbm 1_{B} + \mathbbm 1_{A}^{T} H_{2} \mathbbm 1_{B} + \mathbbm 1_{A}^{T} H_{3} \mathbbm 1_{B}

E_{N} (A, B) = \mathbbm 1_{A}^{T} H_{1} \mathbbm 1_{B} + \mathbbm 1_{A}^{T} H_{2} \mathbbm 1_{B} + \mathbbm 1_{A}^{T} H_{3} \mathbbm 1_{B}

W_{0}^{(i)} = {y \in [N] : ∣ u_{i} (y) ∣ > \frac{2 J}{ϵ N}},

W_{0}^{(i)} = {y \in [N] : ∣ u_{i} (y) ∣ > \frac{2 J}{ϵ N}},

1

1

W_{0} := i < J ⋃ W_{0}^{(i)},

W_{0} := i < J ⋃ W_{0}^{(i)},

W_{k}^{(i)} = {y \in [N] ∖ W_{0}^{(i)} : u_{i} (y) \in \frac{ϵ ^{3/2}}{16 2 J ^{3} N} (k - 1, k]} .

W_{k}^{(i)} = {y \in [N] ∖ W_{0}^{(i)} : u_{i} (y) \in \frac{ϵ ^{3/2}}{16 2 J ^{3} N} (k - 1, k]} .

W_{k_{1}, \dots, k_{J - 1}} = i < J ⋂ W_{k_{i}}^{(i)} .

W_{k_{1}, \dots, k_{J - 1}} = i < J ⋂ W_{k_{i}}^{(i)} .

r\leq\biggl{(}\frac{64J^{2}}{\epsilon^{2}}+3\biggr{)}^{J}.

r\leq\biggl{(}\frac{64J^{2}}{\epsilon^{2}}+3\biggr{)}^{J}.

∣ \mathbbm 1_{w}^{T} H_{1} \mathbbm 1_{x} - \mathbbm 1_{z}^{T} H_{1} \mathbbm 1_{y} ∣ = i < J \sum λ_{i} (u_{i} (w) u_{i} (x) - u_{i} (z) u_{i} (y))

∣ \mathbbm 1_{w}^{T} H_{1} \mathbbm 1_{x} - \mathbbm 1_{z}^{T} H_{1} \mathbbm 1_{y} ∣ = i < J \sum λ_{i} (u_{i} (w) u_{i} (x) - u_{i} (z) u_{i} (y))

\leq ∣ λ_{1} ∣ i < J \sum (∣ (u_{i} (w) - u_{i} (z)) u_{i} (x) ∣ + ∣ u_{i} (z) (u_{i} (x) - u_{i} (y)) ∣)

\leq 2 N i < J \sum \frac{2 J}{ϵ N} (\frac{ϵ ^{3/2}}{16 2 J ^{3} N}) \leq \frac{ϵ}{8} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Average Gromov hyperbolicity and the Parisi ansatz

Sourav Chatterjee

Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA 94305

[email protected]

and

Leila Sloman

Department of Mathematics, Stanford University, 450 Jane Stanford Way, Building 380, Stanford, CA 94305

[email protected]

Abstract.

Gromov hyperbolicity of a metric space measures the distance of the space from a perfect tree-like structure. The measure has a “worst-case” aspect to it, in the sense that it detects a region in the space which sees the maximum deviation from tree-like structure. In this article we introduce an “average-case” version of Gromov hyperbolicity, which detects whether the “most of the space”, with respect to a given probability measure, looks like a tree. The main result of the paper is that if this average hyperbolicity is small, then the space can be approximately embedded in a tree. The proof uses a weighted version of Szemerédi’s regularity lemma from graph theory. The result applies to Gromov hyperbolic spaces as well, since average hyperbolicity is bounded above by Gromov hyperbolicity. As an application, we give a construction of hierarchically organized pure states in any model of a spin glass that satisfies the Parisi ultrametricity ansatz.

Key words and phrases:

Hyperbolic metric space, Gromov hyperbolicity, ultrametricity, spin glass, negative curvature

2010 Mathematics Subject Classification:

51M10, 53C23, 60K35, 82B44

Sourav Chatterjee’s research was partially supported by NSF grant DMS-1855484

Leila Sloman’s research was partially supported by NSF grant DGE-1656518.

1. Gromov hyperbolicity

Let $(S,d)$ be a metric space. The Gromov product of two points $x,y\in S$ with respect to a third point $z\in S$ is defined as

[TABLE]

Note that by the triangle inequality, the Gromov product is always nonnegative. The space is called $\delta$ -hyperbolic (as defined by Gromov [16]) if for any four points $x,y,z,w\in S$ ,

[TABLE]

The smallest $\delta$ for which this is satisfied is known as the Gromov hyperbolicity of $(S,d)$ . The condition (1.1) is known as Gromov’s four point condition. It is not hard to show that if (1.1) is satisfied for all $x,y,z$ for a given $w_{0}$ , then it can be shown that it is satisfied for all $w$ with $2\delta$ in place of $\delta$ . Thus, we may equivalently define hyperbolicity using a three point condition, by fixing $w$ . If (1.1) is satisfied for all $x,y,z$ for some fixed $w$ , then we say that the space is $\delta$ -hyperbolic with base point $w$ .

The notion of hyperbolic metric spaces is closely related to the notion of real trees. If $(T,\rho)$ is a metric space and $x,y\in T$ , an arc from $x$ to $y$ is the image of a topological embedding $\gamma:[a,b]\to T$ with $\gamma(a)=x$ and $\gamma(b)=y$ , where $[a,b]$ is a closed interval in $\mathbb{R}$ (allowing the possibility that $a=b$ ). A geodesic segment from $x$ to $y$ is the image of an isometric embedding $\gamma:[a,b]\to T$ with $\gamma(a)=x$ and $\gamma(b)=y$ . A metric space $(T,\rho)$ is called a real tree if for any $x,y\in T$ , there exist a unique arc from $x$ to $y$ , and this arc is a geodesic segment. A real tree with a distinguished point $r\in T$ is called a rooted real tree with root $r$ .

The most elementary connection between Gromov hyperbolicity and real trees is that a metric space is [math]-hyperbolic if and only if it is isometric to a subset of a real tree. Now suppose that a metric space $(S,d)$ is $\delta$ -hyperbolic for some small but nonzero $\delta$ . Is it approximately isometric to a subset of a real tree, in some sense? The following result shows that this is true when $S$ has finite cardinality, with an error proportional to $\delta\log|S|$ .

Theorem 1.1 (Ghys and de la Harpe [14]).

Let $(S,d)$ be a $\delta$ -hyperbolic metric space with base point $w$ and finite cardinality. Let $k$ be a positive integer such that $|S|\leq 2^{k}+2$ . Then there exists a real tree $(T,\rho)$ with root $r$ and a map $\Phi:S\to T$ such that for all $x\in S$ , $d(x,w)=\rho(\Phi(x),r)$ , and for all $x,y\in S$ , $d(x,y)-2k\delta\leq\rho(\Phi(x),\Phi(y))\leq d(x,y)$ .

It is known that the error of order $\delta\log|S|$ in the above theorem cannot be improved [8]. In particular, it is not possible to have a quasi-isometry where the discrepancy depends solely on $\delta$ .

The notion of Gromov hyperbolicity, introduced by Gromov in a group-theoretic context, has found great success in many areas of mathematics and even in science and engineering. There are many examples of metric spaces, both in theory and practice, that are almost tree-like but not exactly so. Gromov hyperbolicity is a great way to understand and study such examples.

Still, there is one aspect of Gromov hyperbolicity that is sometimes problematic when one ventures outside the domain of very regular objects coming from pure mathematics. It is the fact that the four point condition (1.1) is a worst-case condition: The space is not $\delta$ -hyperbolic if there is even a single four-tuple $(x,y,z,w)$ for which (1.1) fails. There are examples from statistical physics and probability theory where (1.1) holds for most, but not all four-tuples [21]. Here “most” is in terms of a probability measure on the space. Similar examples arise in the applied sciences, such as in the analysis of social networks [2] and phylogeny reconstruction [9].

For these reasons, one may naturally wonder whether the condition (1.1) may be replaced by some kind of an averaged version. This has, indeed, been proposed recently in some physics papers (such as [2]), but these proposals have not been mathematically analyzed. The goal of this manuscript is to fill this gap: We define a natural notion of average Gromov hyperbolicity, and prove an analog of Theorem 1.1 for this measure. Interestingly, unlike Theorem 1.1, this result has no dependence on the size of $S$ . The proof is more involved than the proof of Theorem 1.1, using a weighted version of Szemerédi’s regularity lemma from graph theory. We apply this theorem to show that hierarchically organized pure states can be constructed in any model of a spin glass that satisfies the Parisi ultrametricity ansatz.

2. Main result

We will go beyond metric spaces in our definition of average hyperbolicity. Let $S$ be a set equipped with a countably generated $\sigma$ -algebra $\mathcal{F}$ and a probability measure $\mathbb{P}$ defined on $\mathcal{F}$ . Let $b$ be a positive real number and $s:S\times S\to[0,b]$ be a measurable function satisfying $s(x,y)=s(y,x)$ for all $x,y\in S$ . We will say that $s$ is a “similarity function”. Intuitively, $s(x,y)$ measures the similarity between two points $x$ and $y$ . Similarity functions generalize the notion of Gromov product: If $S$ has finite diameter with respect to a separable metric and is endowed with the Borel $\sigma$ -algebra generated by this metric, the Gromov product $(x,y)_{w}$ is a similarity function for any base point $w\in S$ .

Definition 2.1.

We will say that $(S,\mathcal{F},\mathbb{P},s)$ is $\delta$ -hyperbolic if

[TABLE]

where $x_{+}$ denotes the positive part of a real number $x$ , and $X,Y,Z$ are i.i.d. $S$ -valued random variables with law $\mathbb{P}$ .

It is not hard to show that $(S,\mathcal{F},\mathbb{P},s)$ is [math]-hyperbolic in the above sense if and only if there is a real tree $(T,\rho)$ with root $r$ and set of leaves $S$ , such that for all $x,y$ in the support of $\mathbb{P}$ , we have $s(x,y)=(x,y)_{r}$ , where $(x,y)_{r}$ is the Gromov product of $x$ and $y$ under the metric $\rho$ , with respect to the base point $r$ . We will now generalize this result when $(S,\mathcal{F},\mathbb{P},s)$ is $\delta$ -hyperbolic for some small $\delta$ . First, recall that a graph-theoretic tree, henceforth simply called a tree, is a connected undirected graph without self-loops or closed paths. A rooted tree is a tree where one distinguished node is called the root. A node of a rooted tree is called a leaf if it is not the root and it has degree one.

Definition 2.2.

We will say that a tree $T$ with root $r$ is compatible with $(S,\mathcal{F})$ if the following three conditions are satisfied:

(i)

$S$ * is the set of leaves of $T$ ,* 2. (ii)

$T\setminus S$ * is a finite set, and* 3. (iii)

for any node $v\in T\setminus S$ , the set of leaves that are the descendants of $v$ is a measurable subset of $S$ .

Clearly, any tree that is compatible with $(S,\mathcal{F})$ gives a hierarchical clustering of $S$ , such that the number of clusters is finite and each cluster is measurable. Conversely, any such clustering defines a compatible tree. An example is shown in Figure 1.

If $T$ is a compatible tree with root $r$ , and $x,y\in S$ , we denote by $(x,y)_{r}$ the Gromov product of $x$ and $y$ under the graph distance on $T$ , with respect to the base point $r$ . From the definition of the Gromov product, it is easy to see that $(x,y)_{r}$ is the number of edges in the intersection of the paths leading from $x$ and $y$ to $r$ (see Figure 1).

Definition 2.3.

We will say that $(S,\mathcal{F},\mathbb{P},s)$ is $\delta$ -tree-like if

[TABLE]

where $X$ and $Y$ are independent $S$ -valued random variables with law $\mathbb{P}$ , and the infimum is taken over over all $\alpha\geq 0$ and all rooted trees $T$ that are compatible with $(S,\mathcal{F})$ . Here $r$ is the root of $T$ and $(X,Y)_{r}$ is the Gromov product of $X$ and $Y$ under the graph distance on $T$ , with respect to the base point $r$ .

Note that in the above definition, it follows easily by the definition of compatibility that $(X,Y)_{r}$ is a bounded and measurable random variable, and therefore the expectation is well-defined.

The following theorem is the main result of this paper. It shows that $\text{Hyp}(S,\mathcal{F},\mathbb{P},s)$ is small if and only if $\text{Tree}(S,\mathcal{F},\mathbb{P},s)$ is small.

Theorem 2.4.

Let $S$ , $\mathcal{F}$ , $\mathbb{P}$ , $s$ and $b$ be as above. Then given any $\epsilon>0$ , there is some $\delta>0$ depending only on $\epsilon$ and $b$ , such that if $\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)<\delta$ , then $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\epsilon$ . Conversely, given any $\epsilon>0$ there is some $\delta>0$ depending only on $\epsilon$ and $b$ , such that if $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\delta$ , then $\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)<\epsilon$ .

The above theorem is a generalization of Theorem 1.1 to the setting of average hyperbolicity. The statement is more satisfactory than that of Theorem 1.1 in that the error has no dependence on the size of $S$ . In particular, it remains meaningful even if $S$ has infinite cardinality. Moreover, since Gromov hyperbolicity is obviously greater than or equal to the average hyperbolicity with respect to any probability measure (where the similarity function is the Gromov product with respect to a base point), Theorem 2.4 immediately implies the following corollary about Gromov hyperbolic metric spaces.

Corollary 2.5.

Let $(S,d)$ be a separable metric space with finite diameter $D$ , which is $\delta$ -hyperbolic with respect to a base point $w$ in Gromov’s sense. Then for any probability measure $\mathbb{P}$ defined on the Borel $\sigma$ -algebra of $S$ , there is a rooted tree $T$ with root $r$ that is compatible with $S$ in the sense of Definition 2.2, and a number $\alpha\geq 0$ , such that

[TABLE]

where $\epsilon(\delta,D)$ is a number depending only on $\delta$ and $D$ which tends to [math] as $\delta\to 0$ . Here $(x,y)_{w}$ is the Gromov product of $x$ and $y$ under the metric $d$ , with respect to the base point $w$ , and $(x,y)_{r}$ is the Gromov product of $x$ and $y$ under the graph distance on $T$ , with respect to the base point $r$ .

The dependence of $\delta$ on $\epsilon$ in Theorem 2.4 is an important question. The proof given in this paper uses Szemerédi’s regularity lemma [28], and therefore cannot be expected to yield useful bounds. It would be very interesting to figure out whether Szemerédi’s lemma can be bypassed in the proof of Theorem 2.4. If that is possible, then one can at least hope to get reasonable bounds on $\delta$ in terms of $\epsilon$ .

To see why something like the regularity lemma may be needed, recall the triangle removal lemma of Ruzsa and Szemerédi [25]: If a simple graph on $n$ vertices has $o(n^{3})$ triangles, then it is possible to delete $o(n^{2})$ edges and make it triangle-free. The original proof of this result used Szemerédi’s regularity lemma, and although we now have other approaches [11], there is still no simple proof of this seemingly simple-sounding claim. Theorem 2.4 is a result of a similar spirit, since it asserts that a space which is nearly tree-like in most places may be slightly modified to yield a space that is exactly embeddable in a tree.

3. Hyperbolicity and the Parisi ansatz

In this section we study a well-known class of systems that arise in statistical physics and probability theory that are hyperbolic in the average sense but not in Gromov’s sense.

A spin glass model assigns a random probability measure $\mu_{n}$ on a set $\Sigma_{n}$ , where $\Sigma_{n}$ is usually the hypercube $\{-1,1\}^{n}$ or the sphere of radius $\sqrt{n}$ centered at the origin in $\mathbb{R}^{n}$ . Throughout the rest of this section, we will assume that $\Sigma_{n}$ is either of these two. The specific definitions of these measures are not particularly relevant for this discussion, so we will not bother to introduce them here. The interested reader may consult [19, 33, 34, 22]. The measure $\mu_{n}$ is called the Gibbs measure, and the set $\Sigma_{n}$ is called the configuration space.

An important quantity in spin glass theory is the overlap between two configurations $\sigma^{1},\sigma^{2}\in\Sigma_{n}$ , defined as

[TABLE]

The usual convention in the literature is to denote by $R_{i,j}$ the overlap between $\sigma^{i}$ and $\sigma^{j}$ , where $\sigma^{1},\sigma^{2},\ldots$ is an i.i.d. sequence of configurations drawn from the Gibbs measure $\mu_{n}$ . It was famously conjectured by Parisi [23, 24] that certain spin glass models have the property that in the “ $n=\infty$ limit”, $R_{1,2}$ is greater than or equal to the minimum of $R_{1,3}$ and $R_{2,3}$ with probability one. This is known as the Parisi ultrametricity ansatz. Following a long line of deep contributions by various authors [1, 13, 4, 30], the Parisi conjecture was finally proved by Panchenko [21] for spin glass models that satisfy a certain set of equations known as the generalized Ghirlanda–Guerra identities [13, 20, 29]. The precise statement of Panchenko’s theorem is that in such models, for any $\epsilon>0$ ,

[TABLE]

where $\langle\cdot\rangle$ denotes expectation with respect to the Gibbs measure $\mu_{n}$ , $\mathbb{E}$ denotes expectation with respect to the randomness in $\mu_{n}$ , and $\mathbbm{1}_{A}$ denotes the function that is $1$ on the set $A$ and [math] elsewhere.

It was predicted in a seminal paper of Mézard, Parisi, Sourlas, Toulouse and Virasoro [18] that ultrametricity happens because the infinite volume limit of the Gibbs measure can be decomposed into “hierarchically organized pure states”. Roughly speaking, this means that the configuration space admits a hierarchical clustering, with a number $q_{\alpha}\in[-1,1]$ attached to each cluster $\alpha$ , so that if $\sigma^{1}$ and $\sigma^{2}$ are drawn independently from the Gibbs measure, then with high probability, $R_{1,2}\approx q_{\alpha}$ , where $\alpha$ is the smallest cluster containing both $\sigma^{1}$ and $\sigma^{2}$ (see Figure 2). Here “smallest” means “lowest down in the hierarchy”.

It is not difficult to prove that ultrametricity implies the hierarchical organization of pure states if $R_{1,2}$ can take only finitely many values in the infinite volume limit; this, in fact, is the basis of the heuristic sketched in [18]. However, if this condition does not hold — in which case the system is said to exhibit “full replica symmetry breaking” — then it is not obvious how to establish the hierarchical organization of pure states starting from the Parisi ansatz (3.1).

There are two kinds of systems where the pure state picture has been rigorously established. The first is a class of spin glass models known as pure $p$ -spin spherical models, where the pure state construction was given recently by Subag [26], building on the earlier contributions of [5, 6, 27, 7]. The second is the class of models that have been shown to satisfy the generalized Ghirlanda–Guerra identities. For these models, the construction of pure states was given by Panchenko [21] in the infinite volume limit, and recently by Jagannath [17] in the setting of large but finite $n$ . (See also the earlier works of Talagrand [31, 32].)

Incidentally, the generalized Ghirlanda–Guerra identities are believed to hold in all physically interesting models that satisfy the Parisi ansatz (3.1). Therefore, in principle, the results of [21, 17] should give the pure state construction in all such models, provided that the identities can be established. However, there are other important models, such as the Sherrington–Kirkpatrick (S-K) model, where it is known that the generalized Ghirlanda–Guerra identities do not hold [17, Remark 2.4]. In the S-K model, it is believed that the absolute value of the overlap, rather than the overlap itself, should satisfy the ultrametric property. To account for such cases, we formulate a generalized version of (3.1). We will say that a sequence of spin glass models satisfy the generalized Parisi ansatz if for some bounded measurable $f:[-1,1]\to\mathbb{R}$ ,

[TABLE]

for all $\epsilon>0$ . Theorem 2.4 allows us to prove that hierarchically organized pure states can be constructed for any system that satisfies this generalized ansatz. Since the only systems where ultrametricity has been rigorously established are systems where the pure state construction has also been proved, the result gives no immediate gain. But it is intellectually satisfying and potentially useful for the future. For example, if the generalized Parisi ansatz (3.2) can be proved for the S-K model with $f(x)=|x|$ , our theorem will instantly give the construction of pure states. The precise statement is as follows.

Theorem 3.1.

Consider any sequence of spin glass models that satisfy the generalized Parisi ultrametricity ansatz (3.2) for some bounded measurable function $f$ . Then there are sequences $\epsilon_{n}$ and $\delta_{n}$ tending to zero, such that with probability at least $1-\epsilon_{n}$ , the following happens. There is a hierarchical clustering of the configuration space $\Sigma_{n}$ , such that the number of clusters is finite, each cluster is measurable, and for each cluster $\alpha$ there is a number $q_{\alpha}$ that is a function of its the depth in the hierarchy, with the property that

[TABLE]

where $\alpha=\alpha(\sigma^{1},\sigma^{2})$ is the smallest cluster containing two configurations $\sigma^{1}$ and $\sigma^{2}$ drawn independently from the Gibbs measure and $R_{1,2}$ is their overlap.

Just for clarity, we note that in Theorem 3.1 the sequences $\epsilon_{n}$ and $\delta_{n}$ are deterministic, but the hierarchical clustering is a function of the Gibbs measure (and hence random). We also note that even though the number of clusters is finite, the number may grow with $n$ . Theorem 3.1 is proved as a simple consequence of Theorem 2.4 in Section 10.

4. A vertex-weighted regularity lemma

The key to proving Theorem 2.4 is a weighted version of Szemerédi’s regularity lemma [28]. Although there are a number of weighted regularity lemmas in the literature (such as in [3, 10] and the very recent preprint [15]), we could not find the exact version stated below, which is what we needed for proving Theorem 2.4. Therefore a complete proof is given.

Let $G=(S,E)$ be a finite simple graph. In the following, we will adopt the convention that the set of edges $E$ is the subset of $S^{2}$ consisting of all $(x,y)$ such that there is an edge between $x$ and $y$ . In particular, if there is an edge between $x$ and $y$ , then both $(x,y)$ and $(y,x)$ belong to $E$ .

Let $\mu$ be a nonnegative measure on $S$ . If $U$ and $V$ are disjoint subsets of $S$ , we define the $\mu$ -weighted edge-density between $U$ and $V$ as

[TABLE]

If the denominator is zero, $d(U,V)$ is undefined. Given $\epsilon>0$ , a pair of disjoint sets $U,V\subset S$ will be called a $\mu$ -weighted $\epsilon$ -regular pair if for any $A\subset U$ and $B\subset V$ with $\mu(A)\geq\epsilon\mu(U)$ and $\mu(B)\geq\epsilon\mu(V)$ , we have

[TABLE]

The following theorem is a $\mu$ -weighted version of Szemerédi’s regularity lemma.

Theorem 4.1 (Vertex-weighted regularity lemma).

Let $G=(S,E)$ a finite simple graph and let $\mu$ be a finite nonnegative measure on $S$ . Let

[TABLE]

Take any $\epsilon>0$ and any positive integer $m$ . Then there is a positive real number $p(\epsilon,m)$ and a positive integer $M(\epsilon,m)$ , both depending only on $\epsilon$ and $m$ , such that if $\mu^{*}\leq p(\epsilon,m)\mu(S)$ , then there is a partition $S=V_{0}\cup\dots\cup V_{q}$ with $m\leq q\leq M(\epsilon,m)$ , such that

(i)

$\mu(V_{0})\leq\epsilon\mu(S)$ , 2. (ii)

$\mu(V_{i})>0$ * and $|\mu(V_{i})-\mu(V_{j})|\leq\mu^{*}$ for all $1\leq i,j\leq q$ , and* 3. (iii)

all but at most $\epsilon q^{2}$ pairs $(V_{i},V_{j})$ , $1\leq i\neq j\leq q$ , are $\mu$ -weighted $\epsilon$ -regular, as defined above.

The rest of this section is devoted to the proof of this theorem. We follow the spectral approach to proving Szemerédi’s lemma, pioneered by Frieze and Kannan [12] and lucidly explained in a blog entry of Tao [35]. If $\mu(S)=0$ , there is nothing to prove. So let us assume that $\mu(S)>0$ , and normalize $\mu$ to define a probability measure:

[TABLE]

Also let

[TABLE]

If we prove the theorem for $\mathbb{P}$ instead of $\mu$ (with $P^{*}$ instead of $\mu^{*}$ ), it is easy to see that it proves the theorem for $\mu$ . So we will henceforth work with $\mathbb{P}$ instead of $\mu$ . We will first prove Theorem 4.1 in the case that $\mathbb{P}(x)$ is rational for all $x\in S$ .

Lemma 4.2.

The vertex-weighted regularity lemma holds if $\mathbb{P}(x)$ is rational for each $x$ .

Proof.

Note that if $\epsilon<\epsilon^{\prime}$ , then an $\epsilon$ -regular partition is also an $\epsilon^{\prime}$ -regular partition. So let us assume without loss of generality that $\epsilon<1/4$ .

Since $\mathbb{P}(x)$ is rational for every $x$ , we can find an integer $N$ such that $K(x):=N\mathbb{P}(x)$ is an integer for every $x$ . Let $[N]:=\{1,\ldots,N\}$ . Choose a map $f\colon[N]\to S$ such that $|f^{-1}(x)|=K(x)$ for every $x$ , and these inverse images are disjoint. (This is possible is $\mathbb{P}(S)=1$ .) Let $G_{N}=([N],E_{N})$ be a graph with vertices $[N]$ , and $(x,y)\in E_{N}$ if and only if $(f(x),f(y))\in E$ .

Let $H$ be the adjacency matrix of $G_{N}$ . Then $H$ has a spectral decomposition

[TABLE]

where $u_{i}^{T}$ denotes the transpose of the column vector $u_{i}$ . We will assume the $\lambda_{i}$ ’s are numbered in order of decreasing magnitude, that is,

[TABLE]

Let $F:\mathbb{Z}_{+}\to\mathbb{R}_{+}$ be a function satisfying $F(j)>j$ for all $j$ . The exact choice of $F$ will be made later, and it will depend on $\epsilon$ and $m$ (but not on anything else). Partition the set $\{1,\ldots,N\}$ into sets of the form $\{i:z_{k}\leq i<z_{k+1}\}$ , where $z_{0}=1$ and for $k\geq 1$ ,

[TABLE]

Note that since $F(j)>j$ for all $j$ , $z_{k}$ is a strictly increasing sequence. Also, since

[TABLE]

there exists $k\leq 128\epsilon^{-5}+1$ such that

[TABLE]

Consequently, there exists an integer $J$ such that $J$ is bounded by a constant that depends only on $\epsilon$ and $m$ , and

[TABLE]

If $\lambda_{J}\neq 0$ , then by (4.1), $\lambda_{i}\neq 0$ for all $i<J$ . If $\lambda_{J}=0$ , then again by (4.1), there is some $J^{\prime}\leq J$ such that $\lambda_{i}\neq 0$ for all $i<J^{\prime}$ and $\lambda_{i}=0$ for all $i\geq J^{\prime}$ . Thus, by decreasing $J$ if necessary, we can ensure that $\lambda_{i}\neq 0$ for all $i<J$ . Henceforth, we will assume that this holds. Let

[TABLE]

Then the number of edges $E_{N}(A,B)$ between sets $A,B\subset[N]$ is

[TABLE]

where $\mathbbm{1}_{A}$ is the vector that has $1$ at the coordinates that belong to $A$ and [math] elsewhere. For each $i<J$ , define

[TABLE]

where $u_{i}(y)$ denotes the $y^{\text{th}}$ coordinate of $u_{i}$ . Then, since $u_{i}$ is a unit vector,

[TABLE]

so that $|W_{0}^{(i)}|\leq\epsilon N/2J$ . Thus if

[TABLE]

then $|W_{0}|\leq\epsilon N/2$ . Now partition $[N]\setminus W_{0}^{(i)}$ as the union of $\{W_{k}^{(i)}:|k|\leq 32J^{2}/\epsilon^{2}+1\}$ , where

[TABLE]

After doing this for $i=1,\dots,J-1$ , set

[TABLE]

Note that $\{W_{k_{1},\dots,k_{J-1}}\}$ is a partition of $[N]\setminus W_{0}$ . Enumerate the partition sets as $W_{1},\dots,W_{r}$ . From the definition of the partition, it is clear that

[TABLE]

We will use this bound on $r$ later. Now, since $H$ is the adjacency matrix of a graph on $N$ vertices, a standard result from linear algebra implies that $|\lambda_{1}|\leq N$ . Thus, for $x,y\in W_{k_{1},\dots,k_{J-1}}$ and $w,z\in W_{k_{1}^{\prime},\dots,k_{J-1}^{\prime}}$ ,

[TABLE]

For $1\leq i,j\leq r$ , define

[TABLE]

Then for any $A\subset W_{i}$ and $B\subset W_{j}$ , the above inequality shows that

[TABLE]

We will use this inequality later. We now claim that each $W_{j}$ , $0\leq j\leq r$ , is the pre-image of some subset of $S$ under the map $f$ . To see this, first note that if $f(x)=f(y)$ , then clearly $H\mathbbm{1}_{x}=H\mathbbm{1}_{y}$ . In terms of the spectral decomposition, this can be written as

[TABLE]

By the linear independence of the $u_{i}$ ’s, this shows that for each $i$ , $\lambda_{i}=0$ or $u_{i}(x)=u_{i}(y)$ . But if $i<J$ , then $\lambda_{i}\neq 0$ , and so $x$ and $y$ must belong to the same $W_{k}^{(i)}$ . Since this holds for all $i<J$ , $x$ and $y$ belong to the same $W_{j}$ .

Next, we make the partition equitable by subdividing the $W_{j}$ ’s. By what we just showed, $W_{j}$ is the union of $f^{-1}(x)$ for some set of $x\in S$ . Note that for each $x$ , the pre-image $|f^{-1}(x)|$ has size at most $P^{*}N$ . Let

[TABLE]

If $P^{*}$ is sufficiently small (depending on $m$ ), $m^{*}$ is positive. Partition $W_{j}$ by sorting the pre-images into subsets of size as close as possible to $\epsilon N/2(r+m^{*})$ but no smaller, and one remainder set of size less than $\epsilon N/2(r+m^{*})$ . So,

[TABLE]

with

[TABLE]

and for $k\geq 1$ ,

[TABLE]

The union of the remainder sets is small:

[TABLE]

Define

[TABLE]

as the exceptional set, and relabel the remaining partition sets $\{U_{k}^{(j)}\}_{k,j}$ as $U_{1},\dots,U_{q}$ . Then $|U_{0}|\leq\epsilon N$ , and hence by (4.6),

[TABLE]

Since $r$ can be bounded by a quantity that depends only on $\epsilon$ and $m$ , we can let $M(\epsilon,m)$ to be an upper bound, depending only on $m$ and $\epsilon$ , for the quantity $2(r+m^{*})/\epsilon$ . Now notice that

[TABLE]

Using the definition of $m^{*}$ , we have

[TABLE]

Thus, sufficient smallness of $P^{*}$ (depending on $m$ and $\epsilon$ ) ensures that $q\geq m$ .

By construction of $U_{0},\ldots,U_{q}$ , there is a partition $V_{0},\ldots,V_{q}$ of $S$ such that $U_{i}=f^{-1}(V_{i})$ for each $i$ . Note that

[TABLE]

and for $i\geq 1$ ,

[TABLE]

which implies, in particular, that $|\mathbb{P}(V_{i})-\mathbb{P}(V_{j})|\leq P^{*}$ for all $1\leq i,j\leq q$ . This also shows that $\mathbb{P}(V_{i})>0$ for all $1\leq i\leq q$ .

Next, note that by (4.2), $\operatorname{tr}(H_{2}^{2})\leq\epsilon^{5}N^{2}/128$ . Thus if $H_{2}=[x_{ab}]_{a,b=1}^{N}$ , then

[TABLE]

Let $X_{ij}=\sum_{a\in U_{i},b\in U_{j}}x_{ab}^{2}$ , and let

[TABLE]

Let $\nu$ be the measure on $\{1,\dots,q\}^{2}$ such that $\nu(i,j)=|U_{i}||U_{j}|$ for each $i$ and $j$ . Then

[TABLE]

Thus, by (4.9), $\nu(\Sigma)\leq\epsilon N^{2}/2$ . We can use this to bound $|\Sigma|$ , as follows. By the inequalities (4.6) and (4.7),

[TABLE]

Thus,

[TABLE]

Recall that $r$ is bounded by a constant that depends only on $\epsilon$ and $m$ , and that $\epsilon<1/4$ . Thus, if $P^{*}$ is sufficiently small (depending on $\epsilon$ and $m$ ), this gives

[TABLE]

Suppose that $(i,j)\notin\Sigma$ . Then for $Q\subset U_{i}$ and $R\subset U_{j}$ with $|Q|\geq\epsilon|U_{i}|$ and $|R|\geq\epsilon|U_{j}|$ , the Cauchy–Schwarz inequality and the definition of $\Sigma$ imply that

[TABLE]

Next, note that for any choice of $(i,j)\in\{1,\dots,q\}^{2}$ , and for any $Q\subset U_{i}$ and $R\subset U_{j}$ ,

[TABLE]

Since $\sum_{k=1}^{N}\lambda_{k}^{2}\leq N^{2}$ , and the $\lambda_{k}$ are in order of decreasing magnitude, we have

[TABLE]

so that $|\lambda_{k}|\leq N/\sqrt{k}$ . Thus,

[TABLE]

Now take any $1\leq i,j\leq q$ . Let $k$ and $l$ be indices such that $U_{i}\subset W_{k}$ and $U_{j}\subset W_{l}$ . Define $\delta_{ij}:=d_{kl}$ , where $d_{kl}$ is the quantity defined in (4.4). Then by (4.5), (4.10) and (4.11), we see that if $Q\subset U_{i}$ and $R\subset U_{j}$ , with $(i,j)\in\{1,\dots,q\}^{2}\setminus\Sigma$ , and $|Q|\geq\epsilon|U_{i}|$ and $|R|\geq\epsilon|U_{j}|$ , then

[TABLE]

Now take any $(i,j)\in\{1,\ldots,q\}^{2}\setminus\Sigma$ , and any $A\subset V_{i}$ and $B\subset V_{j}$ with $\mathbb{P}(A)\geq\epsilon\mathbb{P}(V_{i})$ and $\mathbb{P}(B)\geq\epsilon\mathbb{P}(V_{j})$ . Let $Q:=f^{-1}(A)$ and $R:=f^{-1}(B)$ . Then $Q\subset U_{i}$ , $R\subset U_{j}$ , $|Q|\geq\epsilon|U_{i}|$ and $|R|\geq\epsilon|U_{j}|$ . Also,

[TABLE]

and $|Q||R|=N^{2}\mathbb{P}(A)\mathbb{P}(B)$ . Thus, the above calculations show that

[TABLE]

Combining the last two displays and dividing throughout by $N^{2}\mathbb{P}(A)\mathbb{P}(B)$ , we get

[TABLE]

Recalling that $\mathbb{P}(A)\geq\epsilon\mathbb{P}(V_{i})$ and $\mathbb{P}(B)\geq\epsilon\mathbb{P}(V_{j})$ , and applying (4.8), we get

[TABLE]

Now suppose $F$ is chosen in such a way that we can guarantee

[TABLE]

Then from the above bounds it will follow that

[TABLE]

Replacing $A$ be $V_{i}$ and $B$ by $V_{j}$ , we also have $|d(V_{i},V_{j})-\delta_{ij}|\leq\epsilon/2$ . Thus, we would get

[TABLE]

which would complete the proof. So we only have to guarantee (4.12). By the bound on $r$ from (4.3), we see that (4.12) holds if

[TABLE]

Assuming that $P^{*}\leq 1/2m$ , it is now easy to choose $F$ , depending only on $\epsilon$ and $m$ , satisfying the above criterion for every $J\in\mathbb{Z}_{+}$ . ∎

In the final step, we now drop the rationality assumption and prove Theorem 4.1.

Proof of Theorem 4.1.

Enumerate $S=\{x_{1},\ldots,x_{n}\}$ and let $p_{i}:=\mathbb{P}(x_{i})$ . Take any positive real number $\nu$ . Let $q_{1},\ldots,q_{n}$ be positive rational numbers such that $p_{i}\leq q_{i}\leq p_{i}+\nu$ for each $i$ . Let $r_{i}:=q_{i}/\sum q_{j}$ , so that $r_{1},\ldots,r_{n}$ are again rational, $\sum r_{i}=1$ , and for each $i$ ,

[TABLE]

Define the modified weight $\mathbb{P}^{(\nu)}(x_{i}):=r_{i}$ . Suppose that $P^{*}\leq\frac{1}{2}p(\epsilon,m)$ , where $p(\epsilon,m)$ is the bound on the maximum atom required in Lemma 4.2. Then for sufficiently small $\nu$ , the above display shows that we can apply Lemma 4.2 to $\mathbb{P}^{(\nu)}$ . Suppose that we get an $\epsilon$ -regular partition $V_{0}^{(\nu)},\ldots,V_{q}^{(\nu)}$ of $S$ . Now let $\nu\to 0$ . We get a partition as above for each $\nu$ . Since the number of possible partitions is finite, there is a subsequence along which the partitions stabilize for sufficiently small $\nu$ . This allows us to define a limiting partition along this subsequence. Since $\mathbb{P}^{(\nu)}(x)\to\mathbb{P}(x)$ for every $x$ (by the above display), is straightforward to verify that this limiting partition is $\epsilon$ -regular for $\mathbb{P}$ . ∎

5. Preliminary steps

In this section we begin the steps towards the proof of Theorem 2.4. First, note that by rescaling $s$ if necessary, we may assume that $b=1$ . We will work under this assumption for the rest of the paper.

Right away, we begin by observing that the converse statement in Theorem 5.1 is very easy to prove: Take any $\delta>0$ . Suppose that

[TABLE]

Then there exists a tree $T$ with root $r$ , finite diameter, and set of leaves $S$ , and some $\alpha\geq 0$ , such that $(X,Y)_{r}$ is a measurable random variable and

[TABLE]

where $X$ and $Y$ are i.i.d. draws from $\mathbb{P}$ . By Markov’s inequality,

[TABLE]

Therefore if $X$ , $Y$ and $Z$ are i.i.d. draws from $\mathbb{P}$ , then with probability at least $1-3\sqrt{\delta}$ , the quantities $|s(X,Z)-\alpha(X,Z)_{r}|$ , $|s(Y,Z)-\alpha(Y,Z)_{r}|$ and $|s(X,Y)-\alpha(X,Y)_{r}|$ are all bounded above by $\sqrt{\delta}$ . If this happens, then

[TABLE]

Now, since $(x,y)_{r}$ is a Gromov product under the graph distance on a tree, it satisfies

[TABLE]

for all $x,y,z$ . Thus, we get

[TABLE]

Recall that this happens with probability at least $1-3\sqrt{\delta}$ . Also, we have assumed that $b=1$ . Thus,

[TABLE]

This proves the converse part of Theorem 2.4.

We now start our journey towards the proof of the main assertion of Theorem 2.4, namely, that if $\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)$ is small, then $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)$ is also small. We will first prove the following weaker theorem. At the very end of the paper, we will complete the proof of Theorem 2.4 using this theorem.

Theorem 5.1.

Assume that $S$ is a finite set, $\mathcal{F}$ is the power set of $S$ , $\mathbb{P}$ is a probability measure defined on $\mathcal{F}$ , and $s:S\times S\to[0,1]$ is a symmetric function. Let $P^{*}:=\max_{x\in S}\mathbb{P}(x)$ . Then given any $\epsilon>0$ , there is some $\delta>0$ depending only on $\epsilon$ , such that if $P^{*}<\delta$ and $\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)<\delta$ , then $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\epsilon$ .

From here until the end of the proof of Theorem 5.1, we will work under the assumptions stated above. Take any $\delta>0$ and suppose that

[TABLE]

A basic step is to show that for most values of $t\in[0,1]$ , the set

[TABLE]

has small probability. For convenience, let

[TABLE]

The above definition of $\delta_{0}$ will be fixed throughout the remainder of the proof.

Lemma 5.2.

Let $R:=\{t\colon\mathbb{P}^{\otimes 3}(R_{t})\geq\delta_{0}^{4}\}$ . Then $\mathscr{L}(R)\leq\delta_{0}^{4}$ , where $\mathscr{L}$ is Lebesgue measure.

Proof.

Define

[TABLE]

Note that

[TABLE]

Thus,

[TABLE]

If $\mathscr{L}$ is Lebesgue measure on $[0,1]$ , the definition of $R$ implies that

[TABLE]

The claimed result now follows easily by combining the two displays. ∎

Let us now fix some $\epsilon\in(0,1)$ and $m\geq 2$ . This $\epsilon$ and $m$ will remain fixed throughout the rest of the proof. At various steps, we will need to assume that $\epsilon$ is smaller than some universal constant (such as $\epsilon<1/9$ ) or $m$ is bigger than some universal constant (such as $m\geq 20$ ), and we will make these assumptions without explicitly stating so.

Having chosen $\epsilon$ and $m$ , define

[TABLE]

Assume that $\delta_{0}<\kappa/2$ . Let $N$ be the largest integer such that $N\kappa<1$ . Note that $N\leq 1/\kappa\leq 1/\delta_{0}$ . In particular, $N$ is bounded by a constant that depends only on $\epsilon$ and $m$ . We will use this information later. By Lemma 5.2, any subinterval of $[0,1]$ of length $\geq\delta_{0}$ intersects $R^{c}$ . Thus, we can find a sequence $0<t_{1}<t_{2}<\dots<t_{N}<1$ such that for each $i$ , $t_{i}\in R^{c}$ and

[TABLE]

For $y,z\in S$ and $i\in\{1,\dots,N\}$ , define three sets:

[TABLE]

Finally, let

[TABLE]

We now prove two lemmas that will be used several times in the sequel.

Lemma 5.3.

Let $A$ be the set defined above. Then $\mathbb{P}(A)\leq\delta_{0}$ .

Proof.

By the choice of $t_{i}$ , $\mathbb{P}^{\otimes 3}(R_{t_{i}})\leq\delta_{0}^{4}$ for every $i$ . Since $N\leq 1/\delta_{0}$ , this gives

[TABLE]

Thus

[TABLE]

which gives $\mathbb{P}(A)\leq\delta_{0}$ . ∎

Lemma 5.4.

If $z\notin A$ , then $\mathbb{P}^{\otimes 2}(\mathfrak{R}^{2}(z))\leq 2\delta_{0}$ .

Proof.

By the definition of $B(z)$ ,

[TABLE]

On the other hand, since $z\notin A$ , $\mathbb{P}(B(z))\leq\delta_{0}$ . This completes the proof. ∎

6. Formation of approximate cliques

In this section we carry out the main step in the proof of Theorem 5.1. We continue with the notations introduced in the previous section. In particular, $P^{*}$ , $\delta_{0}$ , $R$ , $R_{t}$ , $\mathfrak{R}^{1}(y,z)$ , $\mathfrak{R}^{2}(z)$ , $B(z)$ , $A$ , $\epsilon$ , $m$ , $\kappa$ , $N$ and $t_{1},\ldots,t_{N}$ remain the same as before.

Take any nonempty set $S^{\prime}\subset S\setminus A$ . Take any $t\in\{t_{1},\ldots,t_{N}\}$ , and put an edge between $x,y\in S^{\prime}$ if and only if $s(x,y)\geq t$ . Let $E$ denote this set of edges, and let $G$ be the graph $(S^{\prime},E)$ . Let us continue to denote the restriction of $\mathbb{P}$ to $S^{\prime}$ by $\mathbb{P}$ . Note that this restriction is a measure on $S^{\prime}$ , but not necessarily a probability measure.

Let $p(\epsilon,m)$ and $M(\epsilon,m)$ be as in Theorem 4.1. Throughout this section, we will assume that $\mathbb{P}(S^{\prime})$ is sufficiently large in comparison to $P^{*}$ so that

[TABLE]

A first consequence of this assumption is that we can apply Theorem 4.1 to get a partition $V_{0},\ldots,V_{q}$ of $S^{\prime}$ with the required properties. For $B^{\prime},B\subset S^{\prime}$ , let

[TABLE]

so that in the notation of Theorem 4.1,

[TABLE]

We will fix all of the above throughout the rest of this section. The main result of the section is that $G$ can be slightly modified to make it a disjoint union of cliques. We arrive at this result in several steps. First, we show that $\mathbb{P}(V_{i})$ is appropriately close to $\mathbb{P}(S^{\prime})/q$ .

Lemma 6.1.

For each $1\leq i\leq q$ ,

[TABLE]

In particular, $\mathbb{P}(V_{i})\geq C(\epsilon,m)\mathbb{P}(S^{\prime})$ , where $C(\epsilon,m)$ is a positive real number that depends only on $\epsilon$ and $m$ .

Proof.

By construction, $|\mathbb{P}(V_{i})-\mathbb{P}(V_{j})|\leq P^{*}$ for all $1\leq i,j\leq q$ . Thus, for any $1\leq i\leq q$ ,

[TABLE]

where the last inequality follows from (6.1). Similarly,

[TABLE]

Assume that $\epsilon<1/4$ (which we can, by our stated convention that $\epsilon$ can be taken to be less than any universal constant). Since $q\leq M(\epsilon,m)$ , this completes the proof. ∎

Next, we prove two key lemmas. The first one shows that for any regular pair $(V_{i},V_{j})$ , $d(V_{i},V_{j})$ is either close to zero or close to one.

Lemma 6.2.

There exists a number $\delta^{*}$ depending only on $\epsilon$ , $m$ and $\mathbb{P}(S^{\prime})$ , such that if $\delta_{0}\leq\delta^{*}$ , then the following holds. If $(V_{i},V_{j})$ is an $\epsilon$ -regular pair, and $d(V_{i},V_{j})\geq 3\epsilon$ , then $d(V_{i},V_{j})\geq 1-2\epsilon$ .

The plan of the proof is roughly as follows (see Figure 3 for a schematic representation). We will first find some $x_{0}\in V_{i}$ that connects to a substantial fraction of points in $V_{j}$ , where “substantial” means a set of $\mathbb{P}$ -measure greater than $C\epsilon\mathbb{P}(V_{j})$ for some universal constant $C$ . Call this set $N_{j}(x_{0})$ . By regularity, the edge density between $N_{j}(x_{0})$ and $V_{i}$ will be substantial. This will allow us to find $y_{0}\in N_{j}(x_{0})$ which connects to a substantial fraction of points in $V_{i}$ . Call this set $N_{i}(y_{0})$ . Now take any $b\in N_{i}(y_{0})$ and $a\in N_{j}(x_{0})$ . Since $x_{0}$ is a neighbor of $y_{0}$ and $x_{0}$ is also a neighbor of $a$ , the small hyperbolicity of $S$ will allow us to conclude that it is highly likely that $a$ is a neighbor of $y_{0}$ . But if that happens, then since $b$ is a neighbor of $y_{0}$ and $a$ is also a neighbor of $y_{0}$ , it is highly likely that $b$ is a neighbor of $a$ . From this, we will conclude that the edge density between $N_{j}(x_{0})$ and $N_{i}(y_{0})$ is close to $1$ . Since these sets have substantial size, regularity of $(V_{i},V_{j})$ will imply that $d(V_{i},V_{j})$ is close to $1$ .

Proof of Lemma 6.2.

Throughout this proof, $C(\epsilon,m)$ denotes any positive real number that depends only on $\epsilon$ and $m$ . The value of $C(\epsilon,m)$ may change from line to line. For $x\in S^{\prime}$ , let $N(x)$ denote the neighborhood of $x$ in $G$ . Let $N_{k}(x):=N(x)\cap V_{k}$ for each $k$ . Let $V_{i}$ and $V_{j}$ be as in the statement of the lemma. Since $d(V_{i},V_{j})\geq 3\epsilon$ , we have $\rho(V_{i},V_{j})\geq 3\epsilon\mathbb{P}(V_{i})\mathbb{P}(V_{j})$ , and so there is some $x_{0}\in V_{i}$ for which

[TABLE]

By $\epsilon$ -regularity,

[TABLE]

and therefore

[TABLE]

Now notice that

[TABLE]

Since $x_{0}\notin A$ , $\mathbb{P}(B(x_{0}))\leq\delta_{0}$ . Thus

[TABLE]

so that by (6.3),

[TABLE]

By Lemma 6.1 and the inequality (6.2),

[TABLE]

Combining this with (6.4), we get

[TABLE]

If $\delta_{0}$ is sufficiently small (depending on $\epsilon$ , $m$ and $\mathbb{P}(S^{\prime})$ ), the quantity in brackets on the left is bounded below by $\epsilon$ , and so there is $y_{0}\in N_{j}(x_{0})\cap B(x_{0})^{c}$ such that

[TABLE]

Recalling (6.2), we see that by $\epsilon$ -regularity,

[TABLE]

The quantity $d(N_{j}(x_{0}),N_{i}(y_{0}))$ can be bounded from below as follows:

[TABLE]

We wish to show that the right side is close to $1$ . For that purpose, we write the right side as $(1-(i))(1-(ii))$ , where

[TABLE]

and

[TABLE]

We will now show that $(i)$ and $(ii)$ are small. (To understand heuristically why they should be small, recall Figure 3.) Recalling the definition of $\mathfrak{R}^{2}(y_{0})$ , we see that

[TABLE]

But if $b\in N_{i}(y_{0})$ , then $b$ is a neighbor of $y_{0}$ in $G$ and so $s(b,y_{0})\geq t$ . Thus the above display can be simplified to

[TABLE]

Moreover, recalling that $y_{0}\in N_{j}(x_{0})$ , so that $s(x_{0},y_{0})\geq t$ , and recalling the definition of $\mathfrak{R}^{1}(y,z)$ , it is easy to see that

[TABLE]

Thus,

[TABLE]

By (6.2) and (6.5), $\mathbb{P}(N_{j}(x_{0}))$ and $\mathbb{P}(N_{i}(y_{0}))$ are both bounded below by $C(\epsilon,m)\mathbb{P}(S^{\prime})$ . Since $y_{0}\notin A$ , Lemma 5.4 gives

[TABLE]

On the other hand, since $y_{0}\notin B(x_{0})$ ,

[TABLE]

Combining all of the above observations, we get

[TABLE]

If $\delta_{0}$ is small enough (depending on $\epsilon$ , $m$ and $\mathbb{P}(S^{\prime})$ ), the above quantity is smaller than $\epsilon/2$ . For $(ii)$ , we re-use (6.7) to get

[TABLE]

Again, this is smaller than $\epsilon/2$ if $\delta_{0}$ is small enough. Thus,

[TABLE]

and hence by (6.6), $d(V_{i},V_{j})\geq 1-2\epsilon$ . ∎

Our second key lemma shows that the property of high density between regular pairs has a certain transitivity property.

Lemma 6.3.

There exists a number $\delta^{*}$ depending only on $\epsilon$ , $m$ and $\mathbb{P}(S^{\prime})$ , such that if $\delta_{0}\leq\delta^{*}$ , then the following holds. Suppose that $(V_{a},V_{b})$ is an $\epsilon$ -regular pair. Suppose that $i_{0},i_{1},\ldots,i_{k}$ are distinct elements of $\{1,\ldots,q\}$ such that $i_{0}=a$ , $i_{k}=b$ , $d(V_{i_{j}},V_{i_{j+1}})\geq 1-2\epsilon$ for each $0\leq j\leq k-1$ , and $2\leq k\leq\epsilon^{-1/2}$ . Then $d(V_{a},V_{b})\geq 1-2\epsilon$ .

The proof of this lemma is intuitively quite simple, given that we already have Lemma 6.2. The small hyperbolicity ensures that if we have a path in $G$ that is not too long, then it is likely that the beginning and ending points of the path are connected by an edge. This allows us to conclude that $d(V_{a},V_{b})$ is close to $1$ , as long as $k$ is not too large. In particular, $d(V_{a},V_{b})\geq 3\epsilon$ . But then Lemma 6.2 implies that $d(V_{a},V_{b})\geq 1-2\epsilon$ .

Proof of Lemma 6.3.

Take any sequence of points $x_{j}\in V_{i_{j}}$ , $0\leq j\leq k$ , such that for each $0\leq j\leq k-1$ , $s(x_{j},x_{j+1})\geq t$ , and $s(x_{0},x_{k})<t$ . Let $L$ be the set of all such sequences ( $L$ is allowed to be empty). Since $s(x_{0},x_{k})<t$ , then there is a minimum $j$ such that $s(x_{0},x_{j})<t$ . But $s(x_{0},x_{1})\geq t$ . Thus, $j\geq 2$ , and hence $s(x_{0},x_{j-1})\geq t$ . But we also know that $s(x_{j-1},x_{j})\geq t$ . Therefore, $(x_{0},x_{j},x_{j-1})\in R_{t}$ , where $R_{t}$ is the set defined in (5.1). Since $t\notin R$ and $k\leq\epsilon^{-1/2}$ , this implies that

[TABLE]

On the other hand, let $B:=V_{i_{0}}\times\cdots\times V_{i_{k}}$ . Then

[TABLE]

But by (6.8),

[TABLE]

Combining the last two displays, we get

[TABLE]

By Lemma 6.1, this shows that if $\delta_{0}$ is sufficiently small (depending on $\epsilon$ , $m$ and $\mathbb{P}(S^{\prime})$ ), then

[TABLE]

But then by Lemma 6.2 (assuming that $\epsilon$ is sufficiently small), this gives $d(V_{a},V_{b})\geq 1-2\epsilon$ . ∎

We now begin the main quest of this section, namely, to show that a small fraction of the edges of $G$ can be modified to transform it into a disjoint union of cliques. Throughout the rest of this section, we will assume that:

[TABLE]

First, we define a graph structure on $\{V_{1},\ldots,V_{q}\}$ . We will say that there is an edge between $V_{i}$ and $V_{j}$ if $(V_{i},V_{j})$ is $\epsilon$ -regular and $d(V_{i},V_{j})\geq 1-2\epsilon$ . In this case we will say that $V_{i}$ and $V_{j}$ are neighbors. A subset $\mathcal{N}$ of $\{V_{1},\ldots,V_{q}\}$ will be called a “neighborhood” if there is some $V_{i}\in\mathcal{N}$ such that all other elements of $\mathcal{N}$ are neighbors of $V_{i}$ . In this case we will say that $\mathcal{N}$ is a neighborhood of $V_{i}$ . Note that $\mathcal{N}$ need not contain all the neighbors of $V_{i}$ . Let $\mathfrak{N}$ be a maximal collection of disjoint neighborhoods such that each neighborhood has size $\geq\epsilon^{1/4}q$ . Note that $\mathfrak{N}$ is allowed to be empty, in case there is no neighborhood of size $\geq\epsilon^{1/4}q$ .

Lemma 6.4.

For any distinct $\mathcal{N}_{1},\mathcal{N}_{2}\in\mathfrak{N}$ , there is some $V_{i}\in\mathcal{N}_{1}$ and $V_{j}\in\mathcal{N}_{2}$ such that $(V_{i},V_{j})$ is an $\epsilon$ -regular pair.

Proof.

Since $|\mathcal{N}_{1}|$ and $|\mathcal{N}_{2}|$ are both $\geq\epsilon^{1/4}q$ , there are at least $\epsilon^{1/2}q^{2}$ pairs $(V_{a},V_{b})$ such that $V_{a}\in\mathcal{N}_{1}$ and $V_{b}\in\mathcal{N}_{2}$ . Since the number of irregular pairs is at most $\epsilon q^{2}$ , this shows that at least one of the above pairs must be $\epsilon$ -regular. ∎

Now define a graph structure on $\mathfrak{N}$ as follows. Say that two neighborhoods $\mathcal{N}_{1},\mathcal{N}_{2}\in\mathfrak{N}$ are connected by an edge if there exists $V_{i}\in\mathcal{N}_{1}$ and $V_{j}\in\mathcal{N}_{2}$ such that $V_{i}$ and $V_{j}$ are neighbors (in the sense defined above).

Lemma 6.5.

Under the graph structure defined above, $\mathfrak{N}$ is a disjoint union of cliques.

Proof.

For distinct $\mathcal{N}_{1},\mathcal{N}_{2},\mathcal{N}_{3}\in\mathfrak{N}$ , we have to show that if $\mathcal{N}_{1}$ is a neighbor of $\mathcal{N}_{2}$ , and $\mathcal{N}_{3}$ is a neighbor of $\mathcal{N}_{2}$ , then $\mathcal{N}_{3}$ is a neighbor of $\mathcal{N}_{1}$ . This will imply that $\mathfrak{N}$ is a disjoint union of cliques.

Accordingly, let $V_{i}\in\mathcal{N}_{1}$ and $V_{j}\in\mathcal{N}_{2}$ be neighbors, and let $V_{k}\in\mathcal{N}_{2}$ and $V_{l}\in\mathcal{N}_{3}$ be neighbors. By Lemma 6.4, there is an $\epsilon$ -regular pair $(V_{a},V_{b})$ such that $V_{a}\in\mathcal{N}_{1}$ and $V_{b}\in\mathcal{N}_{3}$ . Suppose that $\mathcal{N}_{i}$ is a neighborhood of $V_{t_{i}}$ , for $i=1,2,3$ . Then the sequence $V_{a},V_{t_{1}},V_{i},V_{j},V_{t_{2}},V_{k},V_{l},V_{t_{3}},V_{b}$ is a path in the graph defined on $\{V_{1},\ldots,V_{q}\}$ (see Figure 4). Since $(V_{a},V_{b})$ is $\epsilon$ -regular, Lemma 6.3 implies that $d(V_{a},V_{b})\geq 1-2\epsilon$ . In other words, $V_{a}$ and $V_{b}$ are neighbors. Thus, $\mathcal{N}_{1}$ is a neighbor of $\mathcal{N}_{3}$ . ∎

Take each clique in $\mathfrak{N}$ , and take the union of its elements. This yields a new collection $\mathfrak{C}$ of disjoint subsets of $\{V_{1},\ldots,V_{q}\}$ .

Lemma 6.6.

We have $|\mathfrak{C}|\leq\epsilon^{-1/4}$ .

Proof.

Simply note that each $\mathcal{C}\in\mathfrak{C}$ has size at least $\epsilon^{1/4}q$ , these sets are disjoint, and their union is a subset of $\{V_{1},\ldots,V_{q}\}$ . Thus, $|\mathfrak{C}|\epsilon^{1/4}q\leq q$ . ∎

Lemma 6.7.

If $V_{i}\in\mathcal{C}_{1}$ and $V_{j}\in\mathcal{C}_{2}$ for two distinct elements $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ of $\mathfrak{C}$ , then $V_{i}$ and $V_{j}$ are not neighbors. On the other hand, if $V_{i},V_{j}\in\mathcal{C}$ for some $\mathcal{C}\in\mathfrak{C}$ , then either $(V_{i},V_{j})$ is an irregular pair, or $V_{i}$ and $V_{j}$ are neighbors. Moreover, in this case even if $(V_{i},V_{j})$ is irregular, there is a path with $\leq 6$ vertices joining $V_{i}$ and $V_{j}$ .

Proof.

If $V_{i}\in\mathcal{C}_{1}$ and $V_{j}\in\mathcal{C}_{2}$ for two distinct elements $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ of $\mathfrak{C}$ , it follows directly from the definition of $\mathfrak{C}$ that $V_{i}$ and $V_{j}$ cannot be neighbors. Next, suppose that $V_{i},V_{j}\in\mathcal{C}$ for some $\mathcal{C}\in\mathfrak{C}$ , and $(V_{i},V_{j})$ is $\epsilon$ -regular. Then either $V_{i},V_{j}\in\mathcal{N}$ for some $\mathcal{N}\in\mathfrak{N}$ , or $V_{i}\in\mathcal{N}_{1}$ and $V_{j}\in\mathcal{N}_{2}$ for some $\mathcal{N}_{1},\mathcal{N}_{2}\in\mathfrak{N}$ that are neighbors. In the first case, suppose that $\mathcal{N}$ is a neighborhood of some $V_{a}$ . Then $V_{i},V_{a},V_{j}$ is a path, and hence by Lemma 6.3, $V_{i}$ is a neighbor of $V_{j}$ . In the second case, suppose that $\mathcal{N}_{1}$ is a neighborhood of $V_{a}$ and $\mathcal{N}_{2}$ is a neighborhood of $V_{b}$ . Since $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ are neighbors, there exist $V_{k}\in\mathcal{N}_{1}$ and $V_{l}\in\mathcal{N}_{2}$ which are neighbors. Then $V_{i},V_{a},V_{k},V_{l},V_{b},V_{j}$ is a path, and hence by Lemma 6.3, $V_{i}$ and $V_{j}$ are neighbors. This argument also establishes that even if $(V_{i},V_{j})$ is an irregular pair, we can find a path with $\leq 6$ vertices joining $V_{i}$ and $V_{j}$ . ∎

Next, let $\mathcal{D}$ be the set of all $V_{i}$ that are not elements of any $\mathcal{C}\in\mathfrak{C}$ .

Lemma 6.8.

For any $V_{i}\in\mathcal{D}$ , there are less than $\epsilon^{1/4}q$ many $V_{j}\in\mathcal{D}$ that are neighbors of $V_{i}$ .

Proof.

Suppose that there is some $V_{i}\in\mathcal{D}$ that has $\geq\epsilon^{1/4}q$ neighbors in $\mathcal{D}$ . Then there is a neighborhood $\mathcal{N}\subset\mathcal{D}$ of size $\geq\epsilon^{1/4}q$ . But this neighborhood is disjoint from all the neighborhoods in $\mathfrak{N}$ . This contradicts the maximality of $\mathfrak{N}$ . ∎

Lemma 6.9.

Suppose that $V_{i}\in\mathcal{D}$ and $\mathcal{C}\in\mathfrak{C}$ are such that $V_{i}$ has at least $\epsilon^{1/3}q$ neighbors in $\mathcal{C}$ . Then $V_{i}$ has less than $\epsilon^{1/3}q$ neighbors in the union of all members of $\mathfrak{C}$ other than $\mathcal{C}$ .

Proof.

Let $\mathcal{S}_{1}$ be the set of all neighbors of $V_{i}$ in $\mathcal{C}$ , and let $\mathcal{S}_{2}$ be the set of all neighbors of $V_{i}$ in the union of all elements of $\mathfrak{C}$ other than $\mathcal{C}$ . By assumption, $|\mathcal{S}_{1}|\geq\epsilon^{1/3}q$ . If also $|\mathcal{S}_{2}|\geq\epsilon^{1/3}q$ , then there are $\geq\epsilon^{2/3}q^{2}$ pairs $(V_{j},V_{k})$ such that $V_{j}\in\mathcal{S}_{1}$ and $V_{k}\in\mathcal{S}_{2}$ . Therefore at least one such pair $(V_{j},V_{k})$ must be $\epsilon$ -regular. Since $V_{j},V_{i},V_{k}$ is a path, Lemma 6.3 shows that $V_{j}$ and $V_{k}$ are neighbors. But this contradicts the first assertion of Lemma 6.7. ∎

For each $\mathcal{C}\in\mathfrak{C}$ , let $\mathcal{C}^{\prime}$ be the superset of $\mathcal{C}$ consisting of all elements of $\mathcal{C}$ and all elements of $\mathcal{D}$ that have $\geq\epsilon^{1/3}q$ neighbors in $\mathcal{C}$ . Let $\mathfrak{C}^{\prime}$ be the set of all such $\mathcal{C}^{\prime}$ . Lemma 6.9 shows for any $V_{i}\in\mathcal{D}$ , there can be at most one $\mathcal{C}\in\mathfrak{C}$ such that $V_{i}$ has $\geq\epsilon^{1/3}q$ neighbors in $\mathcal{C}$ . Thus, the elements of $\mathfrak{C}^{\prime}$ are disjoint. Let $\mathcal{D}^{\prime}$ be the set of all elements of $\mathcal{D}$ that do not belong to any $\mathcal{C}^{\prime}$ . A schematic picture depicting $\mathfrak{C}^{\prime}$ and $\mathcal{D}^{\prime}$ is given in Figure 5.

Lemma 6.10.

For any $\mathcal{C}\in\mathfrak{C}$ , the set $\mathcal{C}^{\prime}$ has the property that any two distinct elements of $\mathcal{C}^{\prime}$ are either neighbors, or an irregular pair.

Proof.

Take any distinct $V_{i},V_{j}\in\mathcal{C}^{\prime}$ such that $(V_{i},V_{j})$ is an $\epsilon$ -regular pair. If they are both in $\mathcal{C}$ , then the assertion is proved by Lemma 6.7.

If $V_{i}\in\mathcal{C}$ and $V_{j}\in\mathcal{D}$ , then $V_{j}$ has a neighbor $V_{k}\in\mathcal{C}$ . By Lemma 6.7, there is a path with $\leq 6$ vertices joining $V_{k}$ and $V_{i}$ . Since $V_{j}$ and $V_{k}$ are neighbors, we can concatenate $V_{j}$ at the beginning of this path to get a path with $\leq 7$ vertices joining $V_{j}$ and $V_{i}$ . Therefore by Lemma 6.3, $V_{j}$ and $V_{i}$ are neighbors.

Lastly, if $V_{i}$ and $V_{j}$ are both in $\mathcal{D}$ , then they have neighbors $V_{k}$ and $V_{l}$ in $\mathcal{C}$ . By Lemma 6.7, there is a path with $\leq 6$ vertices joining $V_{k}$ and $V_{l}$ . Since $V_{i}$ and $V_{k}$ are neighbors, and $V_{j}$ and $V_{l}$ are neighbors, we can concatenate $V_{i}$ at the beginning of the path and $V_{j}$ to the end of the path to get a path with $\leq 8$ vertices joining $V_{i}$ and $V_{j}$ . Therefore by Lemma 6.3, $V_{i}$ and $V_{j}$ are neighbors. ∎

Call a pair $(V_{i},V_{j})$ “bad” if $V_{i}$ and $V_{j}$ are neighbors, but they belong to distinct elements of $\mathfrak{C}^{\prime}$ .

Lemma 6.11.

The number of bad pairs is at most $3\epsilon^{1/12}q^{2}$ .

Proof.

Let $(V_{i},V_{j})$ be a bad pair. We consider several cases. First, by Lemma 6.7, it cannot be that both $V_{i}$ and $V_{j}$ are in the complement of $\mathcal{D}$ .

Next, suppose that $V_{i}\in\mathcal{D}$ and $V_{j}\notin\mathcal{D}$ . Then $V_{i}\in\mathcal{C}_{1}^{\prime}$ for some $\mathcal{C}_{1}\in\mathfrak{C}$ and $V_{j}\in\mathcal{C}_{2}$ for some $\mathcal{C}_{2}\neq\mathcal{C}_{1}$ . By Lemma 6.9, there are less than $\epsilon^{1/3}q$ neighbors of $V_{i}$ in $\mathcal{C}_{2}$ . By Lemma 6.6, there are at most $\epsilon^{-1/4}$ choices of $\mathcal{C}_{2}$ . Thus, there are at most $\epsilon^{-1/4}\epsilon^{1/3}q=\epsilon^{1/12}q$ choices of $V_{j}$ for this $V_{i}$ , and therefore at most $\epsilon^{1/12}q^{2}$ choices of $(V_{i},V_{j})$ of this type.

Finally, suppose that both $V_{i},V_{j}\in\mathcal{D}$ . Then by Lemma 6.8, there are less than $\epsilon^{1/4}q$ choices of $V_{j}$ for each $V_{i}$ . Thus, there are at most $\epsilon^{1/4}q^{2}$ pairs of this type. ∎

Lemma 6.12.

Any element of $\mathcal{D}^{\prime}$ has at most $2\epsilon^{1/12}q$ neighbors among $\{V_{1},\ldots,V_{q}\}$ .

Proof.

Take any $V_{i}\in\mathcal{D}^{\prime}$ and any neighbor $V_{j}$ of $V_{i}$ . Then by Lemma 6.8, there are less than $\epsilon^{1/4}q$ choices of $V_{j}\in\mathcal{D}$ . On the other hand, by definition of $\mathcal{D}^{\prime}$ , $V_{i}$ has less than $\epsilon^{1/3}q$ neighbors in each $\mathcal{C}\in\mathfrak{C}$ . Thus, by Lemma 6.6, there are at most $\epsilon^{1/12}q$ choices of such $V_{j}$ . Since any neighbor of $V_{i}$ is either in $\mathcal{D}$ or in $\mathcal{C}$ for some $\mathcal{C}\in\mathfrak{C}$ , this completes the proof. ∎

We finally arrive at the main result of this section, which says that the graph $G$ can be modified into a disjoint union of cliques by adding and deleting a set of edges that has small $\mathbb{P}^{\otimes 2}$ -measure.

Lemma 6.13.

Under the assumptions (6.1) and (6.9), the graph $G$ can be modified into a disjoint union of cliques by adding and deleting edges in such a way that if $\Delta E$ is the set of all edges that were added or deleted, then

[TABLE]

where $C$ is a universal constant. Moreover, any non-singleton clique $B$ in the resulting graph has

[TABLE]

Proof.

Edges are added and deleted in several steps. First, delete all edges with at least one endpoint in $V_{0}$ . Let $\Delta E_{1}$ be the set of deleted edges. Then clearly

[TABLE]

Next, add all edges between vertices within the same $V_{i}$ , $1\leq i\leq q$ . Let $\Delta E_{2}$ be the set of all edges added in this step. Then by Lemma 6.1,

[TABLE]

In the next step, add all missing edges between any $V_{i}$ and $V_{j}$ that are members of the same $\mathcal{C}^{\prime}\in\mathfrak{C}^{\prime}$ . By Lemma 6.10, such pairs are either irregular, or they are neighbors of each other. In the latter case, the total mass of the missing edges is at most $2\epsilon\mathbb{P}(V_{i})\mathbb{P}(V_{j})$ . Thus, if $\Delta E_{3}$ is the set of edges added in this step, then by Lemma 6.1,

[TABLE]

Next, delete all edges between any $V_{i}\in\mathcal{C}_{1}^{\prime}$ and $V_{j}\in\mathcal{C}_{2}^{\prime}$ where $\mathcal{C}_{1}^{\prime}\neq\mathcal{C}_{2}^{\prime}$ . Then $(V_{i},V_{j})$ is either an irregular pair, or $(V_{i},V_{j})$ is regular but $V_{i}$ and $V_{j}$ are not neighbors, or $(V_{i},V_{j})$ is a bad pair. Thus, if $\Delta E_{4}$ is the set of edges added in this step, then by Lemma 6.2, Lemma 6.11 and Lemma 6.1,

[TABLE]

Finally, delete all edges with at least one vertex in some $V_{i}\in\mathcal{D}^{\prime}$ . Let $\Delta E_{5}$ be the set of deleted edges. Given $V_{i}\in\mathcal{D}^{\prime}$ and any $V_{j}$ , by Lemma 6.12 there are at most $2\epsilon^{1/12}q$ choices of $V_{j}$ such that $V_{j}$ is a neighbor of $V_{i}$ . The other possibilities are that $(V_{i},V_{j})$ is an irregular pair, or $(V_{i},V_{j})$ is regular but $V_{j}$ is not a neighbor of $V_{i}$ , or $V_{j}=V_{i}$ . Therefore by Lemma 6.2 and Lemma 6.1,

[TABLE]

This completes the process of adding and deleting edges. If $\Delta E$ is the set of all edges that were either added or deleted, then the above estimates show that (6.10) holds.

Let us now verify that the resulting graph is a disjoint union of cliques. For each $\mathcal{C}^{\prime}\in\mathfrak{C}^{\prime}$ , let $V(\mathcal{C}^{\prime})$ be the union of all $V\in\mathcal{C}^{\prime}$ . In the new graph, each $V(\mathcal{C}^{\prime})$ is a clique, and there are no edges between two such cliques. Moreover, any vertex that belongs to some $V_{i}\in\mathcal{D}^{\prime}$ has no edges incident to it in the new graph. Thus, the new graph is the disjoint union of the above cliques and a bunch of singleton vertices that are disconnected from all else. This also shows that any non-singleton clique in the new graph must be one of the $V(\mathcal{C}^{\prime})$ ’s. But for any $\mathcal{C}^{\prime}\in\mathfrak{C}^{\prime}$ , Lemma 6.1 gives

[TABLE]

This completes the proof. ∎

7. Constructing the tree

Let $P^{*}$ , $\delta_{0}$ , $A$ , $\epsilon$ , $m$ , $\kappa$ , $N$ and $t_{1},\ldots,t_{N}$ remain as defined in Section 5. We will now repeatedly apply Lemma 6.13 to extract from $S$ a nested hierarchy of subsets with desirable properties. The subsets will be constructed in such a way that each subset is either a singleton, or has $\mathbb{P}$ -measure uniformly bounded below by a positive constant that depends only on $\epsilon$ and $m$ . Any such constant will henceforth be denoted by $C(\epsilon,m)$ . This will allow us to apply Lemma 6.13 to partition such a non-singleton subset if $P^{*}$ and $\delta_{0}$ are small enough, depending only on $\epsilon$ and $m$ . We will keep dividing the non-singleton subsets until we are left with only singletons.

Henceforth, whenever we say “ $\delta_{0}$ and $P^{*}$ are small enough”, we will mean “ $\delta_{0}$ and $P^{*}$ are smaller than constants depending only on $\epsilon$ and $m$ ”.

Let $S^{\prime}=S\setminus A$ . By Lemma 5.3, $\mathbb{P}(S^{\prime})\geq 1/2$ if $\delta_{0}$ is small enough. Define a graph on $S^{\prime}$ as in the beginning of Section 6, using $t=t_{1}$ , and obtain a partition of $S^{\prime}$ using Lemma 6.13. Obtain a partition of $S$ by taking this partition of $S^{\prime}$ and appending to it singleton sets consisting of the elements of $A$ . Let $\mathcal{V}_{1}$ denote this partition. By (6.11), any non-singleton element $V\in\mathcal{V}_{1}$ does not intersect $A$ and satisfies $\mathbb{P}(V)\geq C(\epsilon,m)$ . Thus we can apply Lemma 6.13 to any such $V$ with $t=t_{2}$ , if $\delta_{0}$ and $P^{*}$ are small enough. In this manner, we obtain a collection $\mathcal{V}_{2}$ of disjoint sets, each of which is a subset of some non-singleton element of $\mathcal{V}_{1}$ . Then we partition each non-singleton element of $\mathcal{V}_{2}$ by applying the procedure of Section 6 with $t=t_{3}$ to obtain $\mathcal{V}_{3}$ , and continue this recursive partitioning until we arrive at $\mathcal{V}_{N}$ . This is possible since $N\leq C(\epsilon,m)$ , which, by (6.11), ensures that the conditions (6.1) and (6.9) are never violated if $\delta_{0}$ and $P^{*}$ are small enough.

Having defined $\mathcal{V}_{1},\ldots,\mathcal{V}_{N}$ , define $\mathcal{V}_{N+1}$ to be the set of all singleton sets $\{x\}$ such that $x$ belongs to some non-singleton member of $\mathcal{V}_{N}$ . Note, in particular, that we are not applying Lemma 6.13 while partitioning the elements of $\mathcal{V}_{N}$ into singletons. Lastly, define $\mathcal{V}_{0}:=\{S\}$ .

Let $T$ be the set of all pairs $(i,V)$ where $0\leq i\leq N+1$ and $V\in\mathcal{V}_{i}$ . This is sort of like the union of the $\mathcal{V}_{i}$ ’s, except that we pair each element $V$ with the corresponding $i$ to deal with the problem of the same $V$ appearing in two different $\mathcal{V}_{i}$ ’s (which can happen if some $V$ is partitioned into just one set in some step). For simplicity, we will refer to the element $(i,V)\in T$ as just $V$ .

We will now define a tree structure on $T$ . Note that by construction, if an element $V\in T$ belongs to some $\mathcal{V}_{i}$ , $i\geq 1$ , then it has a uniquely defined parent $U\in\mathcal{V}_{i-1}$ . Putting edges between such parent-child pairs creates a graph which is obviously a tree. Also, it is clear that the set of leaves of this tree can be identified with $S$ . Define $r:=(0,S)$ to be the root of $T$ .

For each non-singleton node $V\in\mathcal{V}_{i}$ for $1\leq i\leq N-1$ , let $\Delta E(V)$ be the set of edges of $V$ that need to be modified while applying Lemma 6.13 to convert $V$ into a disjoint union of cliques. If $V$ is a singleton set, let $\Delta E(V)$ be empty. Let $\Delta E(S^{\prime})$ be the set of edges that need to be modified while applying Lemma 6.13 to $S^{\prime}$ . Lastly, let $\Delta E(A)$ be the set of all pairs $(x,y)$ with at least one of $x$ and $y$ in $A$ . Let $\Delta E$ be the union of all these sets.

We prove three lemmas in this section. In all of these, we assume that $P^{*}$ and $\delta_{0}$ are sufficiently small, depending on $\epsilon$ and $m$ , so that Lemma 6.13 can be applied. We will view the elements of $S$ as the leaves of $T$ , and for any $x,y\in S$ , we will denote by $(x,y)_{r}$ the Gromov product of $x$ and $y$ under the graph distance on $T$ , with respect to the base point $r$ .

Lemma 7.1.

For the set $\Delta E$ defined above, we have

[TABLE]

where $C$ is a universal constant.

Proof.

Note that by Lemma 6.13 and Lemma 5.3,

[TABLE]

Since each $\mathcal{V}_{i}$ is a partition of a subset of $S$ ,

[TABLE]

Therefore, since $N\kappa<1$ by the definition of $N$ , we get

[TABLE]

By the definition (5.2) of $\kappa$ , this gives the desired result. ∎

Lemma 7.2.

For any $(x,y)\notin\Delta E$ such that $x\neq y$ ,

[TABLE]

Proof.

Let $i:=(x,y)_{r}$ , so that $i$ is the largest integer such that $x$ and $y$ both belong to the same member of $\mathcal{V}_{i}$ . First, suppose that $1\leq i\leq N-1$ and $s(x,y)\geq t_{i+1}$ . Let $V$ be the element of $\mathcal{V}_{i}$ that contains $x$ and $y$ . Then while applying Lemma 6.13 to $V$ , there is an edge between $x$ and $y$ in the original graph, but that edge is deleted in the modification. Thus, $(x,y)\in\Delta E(V)\subset\Delta E$ , which is not true by assumption. Therefore $s(x,y)$ must be less than $t_{i+1}$ .

If $i=0$ , then also the above deduction holds: If $s(x,y)\geq t_{1}$ and $x$ and $y$ are both in $S^{\prime}$ , then by the same logic as above we conclude that $(x,y)\in\Delta E$ . On the other hand, if $s(x,y)\geq t_{1}$ and at least one of $x$ and $y$ is outside $S^{\prime}$ , then $(x,y)\in\Delta E(A)\subset\Delta E$ .

Combining the above observations, and recalling the bound (5.3), we get that if $0\leq i\leq N-1$ , then

[TABLE]

If $i=N$ , then note that since $(N+1)\kappa\geq 1$ (by the definition of $N$ ),

[TABLE]

Finally, note that since $x\neq y$ , we cannot have $i=N+1$ . ∎

Lemma 7.3.

For any $(x,y)\notin\Delta E$ such that $x\neq y$ ,

[TABLE]

Proof.

As in the proof of Lemma 7.2, let $i:=(x,y)_{r}$ , and note that since $x\neq y$ , we must have $0\leq i\leq N$ . First, suppose that $2\leq i\leq N$ and $s(x,y)<t_{i}$ . We know that $x$ and $y$ are both in some $V\in\mathcal{V}_{i}$ . Let $U\in\mathcal{V}_{i-1}$ be the parent of $V$ in $T$ . Then while applying Lemma 6.13 to $U$ , $(x,y)$ is not an edge in the original graph, but since $x$ and $y$ both belong to $V$ , $(x,y)$ must be an edge in the modified graph. Thus, $(x,y)\in\Delta E(U)\subset\Delta E$ , which is false by assumption. Consequently, $s(x,y)\geq t_{i}$ .

If $i=1$ and $s(x,y)<t_{1}$ , then either $x$ and $y$ are both in $S^{\prime}$ , in which case the same argument shows that $(x,y)\in\Delta E(S^{\prime})\subset\Delta E$ , or at least one of $x$ and $y$ is in $A$ , in which case $(x,y)\in\Delta E(A)\subset\Delta E$ .

Combining, and applying (5.3), we get that if $1\leq i\leq N$ , then

[TABLE]

Lastly, if $i=0$ , note that the inequality is automatic since $(x,y)_{r}=0$ . This completes the proof of the lemma. ∎

8. Completing the proof of Theorem 5.1

Take any $\eta>0$ . We have to prove the existence of a $\gamma>0$ , depending only on $\eta$ , such that if $P^{*}<\gamma$ and $\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)<\gamma$ , then $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\eta$ . To do this, first choose $\epsilon$ so small and $m$ so large that

[TABLE]

where $C$ is the universal constant from Lemma 7.1, and also

[TABLE]

Let $\delta:=\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)$ , and let $\delta_{0}:=\delta^{1/8}$ . If $P^{*}$ and $\delta_{0}$ are small enough (depending on $\epsilon$ and $m$ ), then the method of Section 7 yields $\Delta E$ and $T$ satisfying the conclusions of Lemmas 7.1, 7.2 and 7.3. Recall also that $0\leq s(x,y)\leq 1$ and $0\leq(x,y)_{r}\kappa\leq(N+1)\kappa\leq 1+\kappa$ for all $x$ and $y$ . Consequently, if $X$ and $Y$ are i.i.d. draws from $\mathbb{P}$ , then

[TABLE]

This shows that if $P^{*}$ and $\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)$ are small enough, depending on $\eta$ , then $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\eta$ .

9. From Theorem 5.1 to Theorem 2.4

In this section we prove Theorem 2.4 using Theorem 5.1. Initially, let us continue working under the assumption that $S$ is finite and $\mathcal{F}$ is the power set of $S$ . Take any $\epsilon>0$ . Then by Theorem 5.1, there is some $\delta>0$ such that if $P^{*}<\delta$ and $\textup{Hyp}(S,\mathcal{F},\mathbb{P},s)<\delta$ , then $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\epsilon$ . Suppose that $P^{*}\geq\delta$ . Then we first create a new system where this violation does not happen. Take each $x\in S$ divide it up into $k(x)$ vertices, where $k(x)$ is chosen so large that $\mathbb{P}(x)/k(x)<\delta$ . Let $S^{\prime}$ be the new set of vertices, consisting of $k(x)$ copies of each $x\in S$ . Let $f$ be a map from $S^{\prime}$ into $S$ that takes any copy of $x\in S$ to $x$ , so that $|f^{-1}(x)|=k(x)$ . Define a probability measure $\mathbb{P}^{\prime}$ on $S^{\prime}$ as

[TABLE]

The probability measure $\mathbb{P}^{\prime}$ can be described in words as follows. Drawing a vertex from $\mathbb{P}^{\prime}$ is the same as first picking a vertex from $\mathbb{P}$ , and then choosing one of its copies in $S^{\prime}$ uniformly at random. Note that if $Y\sim\mathbb{P}^{\prime}$ , then $f(Y)\sim\mathbb{P}$ .

Define also a similarity function $s^{\prime}$ on $S^{\prime}$ as

[TABLE]

Then by the observations from the previous paragraph, it follows that

[TABLE]

where $\mathcal{F}^{\prime}$ is the power set of $S^{\prime}$ . On the other hand $\max_{y\in S}\mathbb{P}^{\prime}(y)<\delta$ by construction. Thus, by Theorem 5.1,

[TABLE]

Consequently, there exists a tree $T^{\prime}$ that is compatible with $S^{\prime}$ (in the sense of Definition 2.2), with root $r$ , and a number $\alpha$ such that

[TABLE]

where $Y$ and $Z$ are i.i.d. draws from $\mathbb{P}^{\prime}$ , and $(Y,Z)_{r}$ is the Gromov product of $Y$ and $Z$ under the graph distance on $T^{\prime}$ , with respect to the base point $r$ .

Now, for each $x\in S$ , let $Y(x)$ be a vertex chosen uniformly at random from $f^{-1}(x)$ . Modify the tree $T^{\prime}$ by deleting all leaves other than the $Y(x)$ ’s, and also deleting the edges joining these leaves to their parents. The resulting graph is still a tree, and its leaves are in one-to-one correspondence with the set $S$ . Thus we can relabel its leaves to define a tree $\widetilde{T}$ with set of leaves $S$ and root $r$ .

Let $X_{1}$ and $X_{2}$ be i.i.d. draws from $\mathbb{P}$ , independent of $\widetilde{T}$ . Then $Y(X_{1})$ and $Y(X_{2})$ are i.i.d. draws from $\mathbb{P}^{\prime}$ , and hence by (9.1),

[TABLE]

But $s^{\prime}(Y(X_{1}),Y(X_{2}))=s(X_{1},X_{2})$ , and by our definition of $\widetilde{T}$ ,

[TABLE]

Therefore $(Y(X_{1}),Y(X_{2}))_{r}=(X_{1},X_{2})_{r}$ , where the Gromov product on the left is on the tree $T^{\prime}$ , and the Gromov product on the right is on the tree $\widetilde{T}$ . This gives

[TABLE]

where the expectation is now taken over $X_{1}$ , $X_{2}$ and $\widetilde{T}$ . Since $\widetilde{T}$ is independent of $X_{1}$ and $X_{2}$ , this proves the existence of a tree $T$ with set of leaves $S$ and root $r$ , such that

[TABLE]

Thus, we may conclude that $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\epsilon$ . This completes the proof of Theorem 2.4 under the assumptions that $S$ is finite and $\mathcal{F}$ is the power set of $S$ .

Let us now consider general $(S,\mathcal{F},\mathbb{P},s)$ , where $\mathcal{F}$ is countably generated. Take any $\epsilon>0$ . The case of finite $S$ gives a $\delta$ corresponding to $\epsilon/2$ . Take this $\delta$ , and suppose that

[TABLE]

We will show that in the general case, this implies $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\epsilon$ .

Let $\{A_{1},A_{2},\ldots\}$ be a set of generators of $\mathcal{F}$ . For each $n$ , let $\mathcal{P}_{n}$ be the partition of $S$ generated by $A_{1},\ldots,A_{n}$ . Let $\mathcal{P}_{n}^{2}$ be the set of all sets of the form $A\times B$ where $A,B\in\mathcal{P}_{n}$ . Let $\mathcal{G}_{n}$ be the set of subsets of $S^{2}$ that are unions of elements of $\mathcal{P}_{n}^{2}$ . Define

[TABLE]

It is not difficult to show that $\mathcal{G}$ is an algebra of sets that generates the $\sigma$ -algebra $\mathcal{F}\times\mathcal{F}$ on $S^{2}$ . Now take any $k\geq 1$ . For $0\leq j\leq k$ , let

[TABLE]

By the measurability of $s$ , $B_{j}\in\mathcal{F}\times\mathcal{F}$ . Therefore by a basic result of measure theory, given any $\eta>0$ there exists $B_{j}^{\prime}\in\mathcal{G}$ such that $\mathbb{P}^{\otimes 2}(B_{j}\Delta B_{j}^{\prime})\leq\eta$ . Define

[TABLE]

so that $\mathbb{P}^{\otimes 2}(D)\leq(k+1)\eta$ .

Since $\mathcal{G}_{n}$ is an increasing sequence, there is some large enough $n$ such that $B_{j}^{\prime}\in\mathcal{G}_{n}$ for all $j$ . Define a function $\widetilde{s}:S^{2}\to[0,1]$ as $\widetilde{s}(x,y)=j/k$ where $j$ is a smallest number such that $(x,y)\in B_{j}^{\prime}$ . If there is no such $j$ , let $\widetilde{s}(x,y)=0$ . Since each $B_{j}^{\prime}$ is a union of members of $\mathcal{P}_{n}^{2}$ , it follows that $\widetilde{s}$ is constant on each element of $\mathcal{P}_{n}^{2}$ .

Now suppose that $\widetilde{s}(x,y)=j/k$ , but $(x,y)\notin B_{j}$ . Then there are two possibilities: (a) $(x,y)\in B_{j}^{\prime}$ . Then clearly, $(x,y)\in D$ . (b) $(x,y)\notin B_{j}^{\prime}$ . In this case, $j$ must be zero and $(x,y)$ must not belong to any $B_{i}^{\prime}$ . But $(x,y)\in B_{i}$ for some $i$ . Thus again, $(x,y)\in D$ .

On the other hand, suppose that $(x,y)\in B_{j}$ but $\widetilde{s}(x,y)\neq j/k$ . Again, this implies that either $(x,y)$ is not in any $B_{i}^{\prime}$ , or $(x,y)\in B_{i}^{\prime}$ for some $i\neq j$ . In the first case, we clearly have $(x,y)\in D$ . In the second, $(x,y)\notin B_{i}$ and hence $(x,y)\in D$ .

Combining the observations of the last two paragraphs, we see that if $|\widetilde{s}(x,y)-s(x,y)|>1/k$ , then $(x,y)\in D$ . Thus, if $X$ and $Y$ are i.i.d. draws from $\mathbb{P}$ , then

[TABLE]

Now recall the assumption (9.2) and the fact that $\delta$ is a function of $\epsilon$ . Therefore, the above display shows that by choosing $k$ large enough (depending on $\epsilon$ ), and then choosing $\eta$ small enough (depending on $k$ and $\epsilon$ ), we can ensure that

[TABLE]

Now let $\widetilde{X}$ be the element of $\mathcal{P}_{n}$ that contains $X$ and let $\widetilde{Y}$ be the element of $\mathcal{P}_{n}$ that contains $Y$ . Since $\mathcal{P}_{n}$ is a finite set, we can endow it with its power set $\sigma$ -algebra $2^{\mathcal{P}_{n}}$ (which identifies with $\mathcal{G}_{n}$ ), and may consider $\widetilde{X}$ and $\widetilde{Y}$ to be $\mathcal{P}_{n}$ -valued random variables. Then $\widetilde{X}$ and $\widetilde{Y}$ are i.i.d. random variables with law $\widetilde{\mathbb{P}}$ , where $\widetilde{\mathbb{P}}$ identifies with the restriction of $\mathbb{P}$ to $\mathcal{G}_{n}$ . Since $\widetilde{s}$ is constant on elements of $\mathcal{P}_{n}^{2}$ , we can naturally view $\widetilde{s}$ as a function on $\mathcal{P}_{n}\times\mathcal{P}_{n}$ . Lastly, observe that $\widetilde{s}(\widetilde{X},\widetilde{Y})=\widetilde{s}(X,Y)$ . Combining all of these observations, we get

[TABLE]

Since $\mathcal{P}_{n}$ has finite cardinality, this implies that

[TABLE]

In particular, there is a tree $\widetilde{T}$ with root $r$ that is compatible with $(\mathcal{P}_{n},2^{\mathcal{P}_{n}})$ , and a number $\alpha\geq 0$ , such that

[TABLE]

where $(\widetilde{X},\widetilde{Y})_{r}$ is the Gromov product of $\widetilde{X}$ and $\widetilde{Y}$ under the graph distance on $\widetilde{T}$ , with respect to the base point $r$ . Let us now extend the tree $\widetilde{T}$ by appending $S$ to the set of nodes, and adding an edge between each $x\in S$ and the element of $\mathcal{P}_{n}$ that contains $x$ . Call the new tree $T$ . Then $S$ is the set of leaves of $T$ . The set $T\setminus S$ is just $\widetilde{T}$ , which is finite. Lastly, for any $v\in T\setminus S$ , the set of leaves that are descendants of $v$ is a union of elements of $\mathcal{P}_{n}$ , and therefore measurable. Thus, $T$ is compatible with $(S,\mathcal{F})$ .

Next, note that $(\widetilde{X},\widetilde{Y})_{r}=(X,Y)_{r}$ , because if $d_{T}$ is the graph distance on $T$ , then $d_{T}(X,r)=d_{\widetilde{T}}(\widetilde{X},r)+1$ , $d_{T}(Y,r)=d_{\widetilde{T}}(\widetilde{Y},r)+1$ , and $d_{T}(X,Y)=d_{\widetilde{T}}(\widetilde{X},\widetilde{Y})+2$ . Also, we know that $\widetilde{s}(\widetilde{X},\widetilde{Y})=\widetilde{s}(X,Y)$ . Therefore by (9.4),

[TABLE]

Invoking (9.3), this shows that if $k$ is chosen large enough (depending on $\epsilon$ ), and then $\eta$ is chosen small enough (depending on $k$ and $\epsilon$ ), we can ensure that

[TABLE]

Consequently, $\textup{Tree}(S,\mathcal{F},\mathbb{P},s)<\epsilon$ , completing the proof of Theorem 2.4.

10. Proof of Theorem 3.1

Take any strictly increasing continuous function $\rho:\mathbb{R}\to[0,\infty)$ , and define the similarity function

[TABLE]

If three configurations $\sigma^{1}$ , $\sigma^{2}$ and $\sigma^{3}$ satisfy

[TABLE]

for some $\epsilon\geq 0$ , then by the monotonicity and uniform continuity of $\rho$ on the range of $f$ ,

[TABLE]

where $\delta(\epsilon)\to 0$ as $\epsilon\to 0$ . From this and the boundedness of $\rho$ on the range of $f$ , we see that if (3.2) holds, then

[TABLE]

Consequently, $\textup{Hyp}(\Sigma_{n},\mathcal{F}_{n},\mu_{n},s_{n})\to 0$ in probability as $n\to\infty$ , where $\mathcal{F}_{n}$ is the power set of $\Sigma_{n}$ if $\Sigma_{n}=\{-1,1\}^{n}$ and the Borel $\sigma$ -algebra of $\Sigma_{n}$ if $\Sigma_{n}=\sqrt{n}\mathbb{S}^{n-1}$ . Thus, Theorem 2.4 implies that

[TABLE]

Therefore, there are sequences $\epsilon_{n}$ and $\delta_{n}$ tending to zero as $n\to\infty$ , such that the following holds. With probability at least $1-\epsilon_{n}$ , there exists a tree $T_{n}$ with root $r_{n}$ , that is compatible with $(\Sigma_{n},\mathcal{F}_{n})$ in the sense of Definition 2.2, and a number $a_{n}\geq 0$ , satisfying

[TABLE]

where $(\sigma^{1},\sigma^{2})_{r_{n}}$ is the Gromov product under graph distance on the tree $T_{n}$ , with respect to the base point $r_{n}$ .

By the remark immediately below Definition 2.2, the nodes of $T_{n}$ give a hierarchical clustering of $\Sigma_{n}$ into measurable clusters. For each node $\alpha$ , let $q_{\alpha}:=\rho^{-1}(a_{n}d_{\alpha})$ , where $d_{\alpha}$ is the length of path from $r_{n}$ to $\alpha$ . If $\alpha$ is the smallest cluster containing $\sigma^{1}$ and $\sigma^{2}$ , then $(\sigma^{1},\sigma^{2})_{r_{n}}=d_{\alpha}$ . Therefore if $\rho(f(R_{1,2}))\approx a_{n}(\sigma^{1},\sigma^{2})_{r_{n}}$ , then $f(R_{1,2})\approx q_{\alpha}$ . This completes the proof.

Acknowledgements

We thank Sky Cao, Wei-Kuo Chen, Persi Diaconis, Jacob Fox, Susan Holmes and Dmitry Panchenko for helpful comments and references.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aizenman and Contucci [1998] Aizenman, M. and Contucci, P. (1998). On the stability of the quenched state in mean-field spin-glass models. J. Statist. Phys., 92 no. 5-6, 765–783.
2Albert, Das Gupta and Mobasheri [2014] Albert, R., Das Gupta, B. and Mobasheri, N. (2014). Topological implications of negative curvature for biological and social networks. Phys. Rev. E, 89 no. 3, 032811.
3Alon, Coja-Oghlan, Hàn, Kang, Rödl and Schacht [2010] Alon, N., Coja-Oghlan, A., Hàn, H., Kang, M., Rödl, V. and Schacht, M. (2010). Quasi-randomness and algorithmic regularity for graphs with general degree distributions. SIAM J. Comput., 39 no. 6, 2336–2362.
4Arguin and Aizenman [2009] Arguin, L.-P. and Aizenman, M. (2009). On the structure of quasi-stationary competing particles systems. Ann. Probab., 37 no. 3, 1080–1113.
5Auffinger and Ben Arous [2013] Auffinger, A. and Ben Arous, G. (2013). Complexity of random smooth functions on the high-dimensional sphere. Ann. Probab., 41 no. 6, 4214–4247.
6Auffinger, Ben Arous and Černý [2013] Auffinger, A., Ben Arous, G. and Černý, J. (2013). Random matrices and complexity of spin glasses. Comm. Pure Appl. Math., 66 no. 2, 165–201.
7Auffinger and Chen [2018] Auffinger, A. and Chen, W.-K. (2018). On the energy landscape of spherical spin glasses. Adv. Math., 330 , 553–588.
8Bowditch [2006] Bowditch, B. H. (2006). A course on geometric group theory. Math. Soc. Japan, Tokyo.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Average Gromov hyperbolicity and the Parisi ansatz

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Gromov hyperbolicity

Theorem 1.1** (Ghys and de la Harpe [14]).**

2. Main result

Definition 2.1**.**

Definition 2.2**.**

Definition 2.3**.**

Theorem 2.4**.**

Corollary 2.5**.**

3. Hyperbolicity and the Parisi ansatz

Theorem 3.1**.**

4. A vertex-weighted regularity lemma

Theorem 4.1** (Vertex-weighted regularity lemma).**

Lemma 4.2**.**

Proof.

Proof of Theorem 4.1.

5. Preliminary steps

Theorem 5.1**.**

Lemma 5.2**.**

Proof.

Lemma 5.3**.**

Proof.

Lemma 5.4**.**

Proof.

6. Formation of approximate cliques

Lemma 6.1**.**

Proof.

Lemma 6.2**.**

Proof of Lemma 6.2.

Lemma 6.3**.**

Proof of Lemma 6.3.

Lemma 6.4**.**

Proof.

Lemma 6.5**.**

Proof.

Lemma 6.6**.**

Proof.

Lemma 6.7**.**

Proof.

Lemma 6.8**.**

Proof.

Lemma 6.9**.**

Proof.

Lemma 6.10**.**

Proof.

Lemma 6.11**.**

Proof.

Lemma 6.12**.**

Proof.

Lemma 6.13**.**

Proof.

7. Constructing the tree

Lemma 7.1**.**

Proof.

Lemma 7.2**.**

Proof.

Lemma 7.3**.**

Proof.

8. Completing the proof of Theorem 5.1

9. From Theorem 5.1 to Theorem 2.4

10. Proof of Theorem 3.1

Acknowledgements

Theorem 1.1 (Ghys and de la Harpe [14]).

Definition 2.1.

Definition 2.2.

Definition 2.3.

Theorem 2.4.

Corollary 2.5.

Theorem 3.1.

Theorem 4.1 (Vertex-weighted regularity lemma).

Lemma 4.2.

Theorem 5.1.

Lemma 5.2.

Lemma 5.3.

Lemma 5.4.

Lemma 6.1.

Lemma 6.2.

Lemma 6.3.

Lemma 6.4.

Lemma 6.5.

Lemma 6.6.

Lemma 6.7.

Lemma 6.8.

Lemma 6.9.

Lemma 6.10.

Lemma 6.11.

Lemma 6.12.

Lemma 6.13.

Lemma 7.1.

Lemma 7.2.

Lemma 7.3.