Average Gromov hyperbolicity and the Parisi ansatz
Sourav Chatterjee, Leila Sloman

TL;DR
This paper introduces an average-case version of Gromov hyperbolicity to determine when a space resembles a tree, and applies this to construct hierarchically organized states in spin glass models following the Parisi ultrametricity ansatz.
Contribution
It defines an average hyperbolicity measure, proves that small average hyperbolicity implies approximate tree embedding, and applies this to spin glass models.
Findings
Average hyperbolicity is bounded above by Gromov hyperbolicity.
Small average hyperbolicity implies approximate tree embedding.
Constructs hierarchically organized pure states in spin glasses.
Abstract
Gromov hyperbolicity of a metric space measures the distance of the space from a perfect tree-like structure. The measure has a "worst-case" aspect to it, in the sense that it detects a region in the space which sees the maximum deviation from tree-like structure. In this article we introduce an "average-case" version of Gromov hyperbolicity, which detects whether the "most of the space", with respect to a given probability measure, looks like a tree. The main result of the paper is that if this average hyperbolicity is small, then the space can be approximately embedded in a tree. The proof uses a weighted version of Szemeredi's regularity lemma from graph theory. The result applies to Gromov hyperbolic spaces as well, since average hyperbolicity is bounded above by Gromov hyperbolicity. As an application, we give a construction of hierarchically organized pure states in any model of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Average Gromov hyperbolicity and the Parisi ansatz
Sourav Chatterjee
Department of Statistics, Stanford University, 390 Jane Stanford Way, Stanford, CA 94305
and
Leila Sloman
Department of Mathematics, Stanford University, 450 Jane Stanford Way, Building 380, Stanford, CA 94305
Abstract.
Gromov hyperbolicity of a metric space measures the distance of the space from a perfect tree-like structure. The measure has a “worst-case” aspect to it, in the sense that it detects a region in the space which sees the maximum deviation from tree-like structure. In this article we introduce an “average-case” version of Gromov hyperbolicity, which detects whether the “most of the space”, with respect to a given probability measure, looks like a tree. The main result of the paper is that if this average hyperbolicity is small, then the space can be approximately embedded in a tree. The proof uses a weighted version of Szemerédi’s regularity lemma from graph theory. The result applies to Gromov hyperbolic spaces as well, since average hyperbolicity is bounded above by Gromov hyperbolicity. As an application, we give a construction of hierarchically organized pure states in any model of a spin glass that satisfies the Parisi ultrametricity ansatz.
Key words and phrases:
Hyperbolic metric space, Gromov hyperbolicity, ultrametricity, spin glass, negative curvature
2010 Mathematics Subject Classification:
51M10, 53C23, 60K35, 82B44
Sourav Chatterjee’s research was partially supported by NSF grant DMS-1855484
Leila Sloman’s research was partially supported by NSF grant DGE-1656518.
1. Gromov hyperbolicity
Let be a metric space. The Gromov product of two points with respect to a third point is defined as
[TABLE]
Note that by the triangle inequality, the Gromov product is always nonnegative. The space is called -hyperbolic (as defined by Gromov [16]) if for any four points ,
[TABLE]
The smallest for which this is satisfied is known as the Gromov hyperbolicity of . The condition (1.1) is known as Gromov’s four point condition. It is not hard to show that if (1.1) is satisfied for all for a given , then it can be shown that it is satisfied for all with in place of . Thus, we may equivalently define hyperbolicity using a three point condition, by fixing . If (1.1) is satisfied for all for some fixed , then we say that the space is -hyperbolic with base point .
The notion of hyperbolic metric spaces is closely related to the notion of real trees. If is a metric space and , an arc from to is the image of a topological embedding with and , where is a closed interval in (allowing the possibility that ). A geodesic segment from to is the image of an isometric embedding with and . A metric space is called a real tree if for any , there exist a unique arc from to , and this arc is a geodesic segment. A real tree with a distinguished point is called a rooted real tree with root .
The most elementary connection between Gromov hyperbolicity and real trees is that a metric space is [math]-hyperbolic if and only if it is isometric to a subset of a real tree. Now suppose that a metric space is -hyperbolic for some small but nonzero . Is it approximately isometric to a subset of a real tree, in some sense? The following result shows that this is true when has finite cardinality, with an error proportional to .
Theorem 1.1** (Ghys and de la Harpe [14]).**
Let be a -hyperbolic metric space with base point and finite cardinality. Let be a positive integer such that . Then there exists a real tree with root and a map such that for all , , and for all , .
It is known that the error of order in the above theorem cannot be improved [8]. In particular, it is not possible to have a quasi-isometry where the discrepancy depends solely on .
The notion of Gromov hyperbolicity, introduced by Gromov in a group-theoretic context, has found great success in many areas of mathematics and even in science and engineering. There are many examples of metric spaces, both in theory and practice, that are almost tree-like but not exactly so. Gromov hyperbolicity is a great way to understand and study such examples.
Still, there is one aspect of Gromov hyperbolicity that is sometimes problematic when one ventures outside the domain of very regular objects coming from pure mathematics. It is the fact that the four point condition (1.1) is a worst-case condition: The space is not -hyperbolic if there is even a single four-tuple for which (1.1) fails. There are examples from statistical physics and probability theory where (1.1) holds for most, but not all four-tuples [21]. Here “most” is in terms of a probability measure on the space. Similar examples arise in the applied sciences, such as in the analysis of social networks [2] and phylogeny reconstruction [9].
For these reasons, one may naturally wonder whether the condition (1.1) may be replaced by some kind of an averaged version. This has, indeed, been proposed recently in some physics papers (such as [2]), but these proposals have not been mathematically analyzed. The goal of this manuscript is to fill this gap: We define a natural notion of average Gromov hyperbolicity, and prove an analog of Theorem 1.1 for this measure. Interestingly, unlike Theorem 1.1, this result has no dependence on the size of . The proof is more involved than the proof of Theorem 1.1, using a weighted version of Szemerédi’s regularity lemma from graph theory. We apply this theorem to show that hierarchically organized pure states can be constructed in any model of a spin glass that satisfies the Parisi ultrametricity ansatz.
2. Main result
We will go beyond metric spaces in our definition of average hyperbolicity. Let be a set equipped with a countably generated -algebra and a probability measure defined on . Let be a positive real number and be a measurable function satisfying for all . We will say that is a “similarity function”. Intuitively, measures the similarity between two points and . Similarity functions generalize the notion of Gromov product: If has finite diameter with respect to a separable metric and is endowed with the Borel -algebra generated by this metric, the Gromov product is a similarity function for any base point .
Definition 2.1**.**
We will say that is -hyperbolic if
[TABLE]
where denotes the positive part of a real number , and are i.i.d. -valued random variables with law .
It is not hard to show that is [math]-hyperbolic in the above sense if and only if there is a real tree with root and set of leaves , such that for all in the support of , we have , where is the Gromov product of and under the metric , with respect to the base point . We will now generalize this result when is -hyperbolic for some small . First, recall that a graph-theoretic tree, henceforth simply called a tree, is a connected undirected graph without self-loops or closed paths. A rooted tree is a tree where one distinguished node is called the root. A node of a rooted tree is called a leaf if it is not the root and it has degree one.
Definition 2.2**.**
We will say that a tree with root is compatible with if the following three conditions are satisfied:
- (i)
* is the set of leaves of ,* 2. (ii)
* is a finite set, and* 3. (iii)
for any node , the set of leaves that are the descendants of is a measurable subset of .
Clearly, any tree that is compatible with gives a hierarchical clustering of , such that the number of clusters is finite and each cluster is measurable. Conversely, any such clustering defines a compatible tree. An example is shown in Figure 1.
If is a compatible tree with root , and , we denote by the Gromov product of and under the graph distance on , with respect to the base point . From the definition of the Gromov product, it is easy to see that is the number of edges in the intersection of the paths leading from and to (see Figure 1).
Definition 2.3**.**
We will say that is -tree-like if
[TABLE]
where and are independent -valued random variables with law , and the infimum is taken over over all and all rooted trees that are compatible with . Here is the root of and is the Gromov product of and under the graph distance on , with respect to the base point .
Note that in the above definition, it follows easily by the definition of compatibility that is a bounded and measurable random variable, and therefore the expectation is well-defined.
The following theorem is the main result of this paper. It shows that is small if and only if is small.
Theorem 2.4**.**
Let , , , and be as above. Then given any , there is some depending only on and , such that if , then . Conversely, given any there is some depending only on and , such that if , then .
The above theorem is a generalization of Theorem 1.1 to the setting of average hyperbolicity. The statement is more satisfactory than that of Theorem 1.1 in that the error has no dependence on the size of . In particular, it remains meaningful even if has infinite cardinality. Moreover, since Gromov hyperbolicity is obviously greater than or equal to the average hyperbolicity with respect to any probability measure (where the similarity function is the Gromov product with respect to a base point), Theorem 2.4 immediately implies the following corollary about Gromov hyperbolic metric spaces.
Corollary 2.5**.**
Let be a separable metric space with finite diameter , which is -hyperbolic with respect to a base point in Gromov’s sense. Then for any probability measure defined on the Borel -algebra of , there is a rooted tree with root that is compatible with in the sense of Definition 2.2, and a number , such that
[TABLE]
where is a number depending only on and which tends to [math] as . Here is the Gromov product of and under the metric , with respect to the base point , and is the Gromov product of and under the graph distance on , with respect to the base point .
The dependence of on in Theorem 2.4 is an important question. The proof given in this paper uses Szemerédi’s regularity lemma [28], and therefore cannot be expected to yield useful bounds. It would be very interesting to figure out whether Szemerédi’s lemma can be bypassed in the proof of Theorem 2.4. If that is possible, then one can at least hope to get reasonable bounds on in terms of .
To see why something like the regularity lemma may be needed, recall the triangle removal lemma of Ruzsa and Szemerédi [25]: If a simple graph on vertices has triangles, then it is possible to delete edges and make it triangle-free. The original proof of this result used Szemerédi’s regularity lemma, and although we now have other approaches [11], there is still no simple proof of this seemingly simple-sounding claim. Theorem 2.4 is a result of a similar spirit, since it asserts that a space which is nearly tree-like in most places may be slightly modified to yield a space that is exactly embeddable in a tree.
3. Hyperbolicity and the Parisi ansatz
In this section we study a well-known class of systems that arise in statistical physics and probability theory that are hyperbolic in the average sense but not in Gromov’s sense.
A spin glass model assigns a random probability measure on a set , where is usually the hypercube or the sphere of radius centered at the origin in . Throughout the rest of this section, we will assume that is either of these two. The specific definitions of these measures are not particularly relevant for this discussion, so we will not bother to introduce them here. The interested reader may consult [19, 33, 34, 22]. The measure is called the Gibbs measure, and the set is called the configuration space.
An important quantity in spin glass theory is the overlap between two configurations , defined as
[TABLE]
The usual convention in the literature is to denote by the overlap between and , where is an i.i.d. sequence of configurations drawn from the Gibbs measure . It was famously conjectured by Parisi [23, 24] that certain spin glass models have the property that in the “ limit”, is greater than or equal to the minimum of and with probability one. This is known as the Parisi ultrametricity ansatz. Following a long line of deep contributions by various authors [1, 13, 4, 30], the Parisi conjecture was finally proved by Panchenko [21] for spin glass models that satisfy a certain set of equations known as the generalized Ghirlanda–Guerra identities [13, 20, 29]. The precise statement of Panchenko’s theorem is that in such models, for any ,
[TABLE]
where denotes expectation with respect to the Gibbs measure , denotes expectation with respect to the randomness in , and denotes the function that is on the set and [math] elsewhere.
It was predicted in a seminal paper of Mézard, Parisi, Sourlas, Toulouse and Virasoro [18] that ultrametricity happens because the infinite volume limit of the Gibbs measure can be decomposed into “hierarchically organized pure states”. Roughly speaking, this means that the configuration space admits a hierarchical clustering, with a number attached to each cluster , so that if and are drawn independently from the Gibbs measure, then with high probability, , where is the smallest cluster containing both and (see Figure 2). Here “smallest” means “lowest down in the hierarchy”.
It is not difficult to prove that ultrametricity implies the hierarchical organization of pure states if can take only finitely many values in the infinite volume limit; this, in fact, is the basis of the heuristic sketched in [18]. However, if this condition does not hold — in which case the system is said to exhibit “full replica symmetry breaking” — then it is not obvious how to establish the hierarchical organization of pure states starting from the Parisi ansatz (3.1).
There are two kinds of systems where the pure state picture has been rigorously established. The first is a class of spin glass models known as pure -spin spherical models, where the pure state construction was given recently by Subag [26], building on the earlier contributions of [5, 6, 27, 7]. The second is the class of models that have been shown to satisfy the generalized Ghirlanda–Guerra identities. For these models, the construction of pure states was given by Panchenko [21] in the infinite volume limit, and recently by Jagannath [17] in the setting of large but finite . (See also the earlier works of Talagrand [31, 32].)
Incidentally, the generalized Ghirlanda–Guerra identities are believed to hold in all physically interesting models that satisfy the Parisi ansatz (3.1). Therefore, in principle, the results of [21, 17] should give the pure state construction in all such models, provided that the identities can be established. However, there are other important models, such as the Sherrington–Kirkpatrick (S-K) model, where it is known that the generalized Ghirlanda–Guerra identities do not hold [17, Remark 2.4]. In the S-K model, it is believed that the absolute value of the overlap, rather than the overlap itself, should satisfy the ultrametric property. To account for such cases, we formulate a generalized version of (3.1). We will say that a sequence of spin glass models satisfy the generalized Parisi ansatz if for some bounded measurable ,
[TABLE]
for all . Theorem 2.4 allows us to prove that hierarchically organized pure states can be constructed for any system that satisfies this generalized ansatz. Since the only systems where ultrametricity has been rigorously established are systems where the pure state construction has also been proved, the result gives no immediate gain. But it is intellectually satisfying and potentially useful for the future. For example, if the generalized Parisi ansatz (3.2) can be proved for the S-K model with , our theorem will instantly give the construction of pure states. The precise statement is as follows.
Theorem 3.1**.**
Consider any sequence of spin glass models that satisfy the generalized Parisi ultrametricity ansatz (3.2) for some bounded measurable function . Then there are sequences and tending to zero, such that with probability at least , the following happens. There is a hierarchical clustering of the configuration space , such that the number of clusters is finite, each cluster is measurable, and for each cluster there is a number that is a function of its the depth in the hierarchy, with the property that
[TABLE]
where is the smallest cluster containing two configurations and drawn independently from the Gibbs measure and is their overlap.
Just for clarity, we note that in Theorem 3.1 the sequences and are deterministic, but the hierarchical clustering is a function of the Gibbs measure (and hence random). We also note that even though the number of clusters is finite, the number may grow with . Theorem 3.1 is proved as a simple consequence of Theorem 2.4 in Section 10.
4. A vertex-weighted regularity lemma
The key to proving Theorem 2.4 is a weighted version of Szemerédi’s regularity lemma [28]. Although there are a number of weighted regularity lemmas in the literature (such as in [3, 10] and the very recent preprint [15]), we could not find the exact version stated below, which is what we needed for proving Theorem 2.4. Therefore a complete proof is given.
Let be a finite simple graph. In the following, we will adopt the convention that the set of edges is the subset of consisting of all such that there is an edge between and . In particular, if there is an edge between and , then both and belong to .
Let be a nonnegative measure on . If and are disjoint subsets of , we define the -weighted edge-density between and as
[TABLE]
If the denominator is zero, is undefined. Given , a pair of disjoint sets will be called a -weighted -regular pair if for any and with and , we have
[TABLE]
The following theorem is a -weighted version of Szemerédi’s regularity lemma.
Theorem 4.1** (Vertex-weighted regularity lemma).**
Let a finite simple graph and let be a finite nonnegative measure on . Let
[TABLE]
Take any and any positive integer . Then there is a positive real number and a positive integer , both depending only on and , such that if , then there is a partition with , such that
- (i)
, 2. (ii)
* and for all , and* 3. (iii)
all but at most pairs , , are -weighted -regular, as defined above.
The rest of this section is devoted to the proof of this theorem. We follow the spectral approach to proving Szemerédi’s lemma, pioneered by Frieze and Kannan [12] and lucidly explained in a blog entry of Tao [35]. If , there is nothing to prove. So let us assume that , and normalize to define a probability measure:
[TABLE]
Also let
[TABLE]
If we prove the theorem for instead of (with instead of ), it is easy to see that it proves the theorem for . So we will henceforth work with instead of . We will first prove Theorem 4.1 in the case that is rational for all .
Lemma 4.2**.**
The vertex-weighted regularity lemma holds if is rational for each .
Proof.
Note that if , then an -regular partition is also an -regular partition. So let us assume without loss of generality that .
Since is rational for every , we can find an integer such that is an integer for every . Let . Choose a map such that for every , and these inverse images are disjoint. (This is possible is .) Let be a graph with vertices , and if and only if .
Let be the adjacency matrix of . Then has a spectral decomposition
[TABLE]
where denotes the transpose of the column vector . We will assume the ’s are numbered in order of decreasing magnitude, that is,
[TABLE]
Let be a function satisfying for all . The exact choice of will be made later, and it will depend on and (but not on anything else). Partition the set into sets of the form , where and for ,
[TABLE]
Note that since for all , is a strictly increasing sequence. Also, since
[TABLE]
there exists such that
[TABLE]
Consequently, there exists an integer such that is bounded by a constant that depends only on and , and
[TABLE]
If , then by (4.1), for all . If , then again by (4.1), there is some such that for all and for all . Thus, by decreasing if necessary, we can ensure that for all . Henceforth, we will assume that this holds. Let
[TABLE]
Then the number of edges between sets is
[TABLE]
where is the vector that has at the coordinates that belong to and [math] elsewhere. For each , define
[TABLE]
where denotes the coordinate of . Then, since is a unit vector,
[TABLE]
so that . Thus if
[TABLE]
then . Now partition as the union of , where
[TABLE]
After doing this for , set
[TABLE]
Note that is a partition of . Enumerate the partition sets as . From the definition of the partition, it is clear that
[TABLE]
We will use this bound on later. Now, since is the adjacency matrix of a graph on vertices, a standard result from linear algebra implies that . Thus, for and ,
[TABLE]
For , define
[TABLE]
Then for any and , the above inequality shows that
[TABLE]
We will use this inequality later. We now claim that each , , is the pre-image of some subset of under the map . To see this, first note that if , then clearly . In terms of the spectral decomposition, this can be written as
[TABLE]
By the linear independence of the ’s, this shows that for each , or . But if , then , and so and must belong to the same . Since this holds for all , and belong to the same .
Next, we make the partition equitable by subdividing the ’s. By what we just showed, is the union of for some set of . Note that for each , the pre-image has size at most . Let
[TABLE]
If is sufficiently small (depending on ), is positive. Partition by sorting the pre-images into subsets of size as close as possible to but no smaller, and one remainder set of size less than . So,
[TABLE]
with
[TABLE]
and for ,
[TABLE]
The union of the remainder sets is small:
[TABLE]
Define
[TABLE]
as the exceptional set, and relabel the remaining partition sets as . Then , and hence by (4.6),
[TABLE]
Since can be bounded by a quantity that depends only on and , we can let to be an upper bound, depending only on and , for the quantity . Now notice that
[TABLE]
Using the definition of , we have
[TABLE]
Thus, sufficient smallness of (depending on and ) ensures that .
By construction of , there is a partition of such that for each . Note that
[TABLE]
and for ,
[TABLE]
which implies, in particular, that for all . This also shows that for all .
Next, note that by (4.2), . Thus if , then
[TABLE]
Let , and let
[TABLE]
Let be the measure on such that for each and . Then
[TABLE]
Thus, by (4.9), . We can use this to bound , as follows. By the inequalities (4.6) and (4.7),
[TABLE]
Thus,
[TABLE]
Recall that is bounded by a constant that depends only on and , and that . Thus, if is sufficiently small (depending on and ), this gives
[TABLE]
Suppose that . Then for and with and , the Cauchy–Schwarz inequality and the definition of imply that
[TABLE]
Next, note that for any choice of , and for any and ,
[TABLE]
Since , and the are in order of decreasing magnitude, we have
[TABLE]
so that . Thus,
[TABLE]
Now take any . Let and be indices such that and . Define , where is the quantity defined in (4.4). Then by (4.5), (4.10) and (4.11), we see that if and , with , and and , then
[TABLE]
Now take any , and any and with and . Let and . Then , , and . Also,
[TABLE]
and . Thus, the above calculations show that
[TABLE]
Combining the last two displays and dividing throughout by , we get
[TABLE]
Recalling that and , and applying (4.8), we get
[TABLE]
Now suppose is chosen in such a way that we can guarantee
[TABLE]
Then from the above bounds it will follow that
[TABLE]
Replacing be and by , we also have . Thus, we would get
[TABLE]
which would complete the proof. So we only have to guarantee (4.12). By the bound on from (4.3), we see that (4.12) holds if
[TABLE]
Assuming that , it is now easy to choose , depending only on and , satisfying the above criterion for every . ∎
In the final step, we now drop the rationality assumption and prove Theorem 4.1.
Proof of Theorem 4.1.
Enumerate and let . Take any positive real number . Let be positive rational numbers such that for each . Let , so that are again rational, , and for each ,
[TABLE]
Define the modified weight . Suppose that , where is the bound on the maximum atom required in Lemma 4.2. Then for sufficiently small , the above display shows that we can apply Lemma 4.2 to . Suppose that we get an -regular partition of . Now let . We get a partition as above for each . Since the number of possible partitions is finite, there is a subsequence along which the partitions stabilize for sufficiently small . This allows us to define a limiting partition along this subsequence. Since for every (by the above display), is straightforward to verify that this limiting partition is -regular for . ∎
5. Preliminary steps
In this section we begin the steps towards the proof of Theorem 2.4. First, note that by rescaling if necessary, we may assume that . We will work under this assumption for the rest of the paper.
Right away, we begin by observing that the converse statement in Theorem 5.1 is very easy to prove: Take any . Suppose that
[TABLE]
Then there exists a tree with root , finite diameter, and set of leaves , and some , such that is a measurable random variable and
[TABLE]
where and are i.i.d. draws from . By Markov’s inequality,
[TABLE]
Therefore if , and are i.i.d. draws from , then with probability at least , the quantities , and are all bounded above by . If this happens, then
[TABLE]
Now, since is a Gromov product under the graph distance on a tree, it satisfies
[TABLE]
for all . Thus, we get
[TABLE]
Recall that this happens with probability at least . Also, we have assumed that . Thus,
[TABLE]
This proves the converse part of Theorem 2.4.
We now start our journey towards the proof of the main assertion of Theorem 2.4, namely, that if is small, then is also small. We will first prove the following weaker theorem. At the very end of the paper, we will complete the proof of Theorem 2.4 using this theorem.
Theorem 5.1**.**
Assume that is a finite set, is the power set of , is a probability measure defined on , and is a symmetric function. Let . Then given any , there is some depending only on , such that if and , then .
From here until the end of the proof of Theorem 5.1, we will work under the assumptions stated above. Take any and suppose that
[TABLE]
A basic step is to show that for most values of , the set
[TABLE]
has small probability. For convenience, let
[TABLE]
The above definition of will be fixed throughout the remainder of the proof.
Lemma 5.2**.**
Let . Then , where is Lebesgue measure.
Proof.
Define
[TABLE]
Note that
[TABLE]
Thus,
[TABLE]
If is Lebesgue measure on , the definition of implies that
[TABLE]
The claimed result now follows easily by combining the two displays. ∎
Let us now fix some and . This and will remain fixed throughout the rest of the proof. At various steps, we will need to assume that is smaller than some universal constant (such as ) or is bigger than some universal constant (such as ), and we will make these assumptions without explicitly stating so.
Having chosen and , define
[TABLE]
Assume that . Let be the largest integer such that . Note that . In particular, is bounded by a constant that depends only on and . We will use this information later. By Lemma 5.2, any subinterval of of length intersects . Thus, we can find a sequence such that for each , and
[TABLE]
For and , define three sets:
[TABLE]
Finally, let
[TABLE]
We now prove two lemmas that will be used several times in the sequel.
Lemma 5.3**.**
Let be the set defined above. Then .
Proof.
By the choice of , for every . Since , this gives
[TABLE]
Thus
[TABLE]
which gives . ∎
Lemma 5.4**.**
If , then .
Proof.
By the definition of ,
[TABLE]
On the other hand, since , . This completes the proof. ∎
6. Formation of approximate cliques
In this section we carry out the main step in the proof of Theorem 5.1. We continue with the notations introduced in the previous section. In particular, , , , , , , , , , , , and remain the same as before.
Take any nonempty set . Take any , and put an edge between if and only if . Let denote this set of edges, and let be the graph . Let us continue to denote the restriction of to by . Note that this restriction is a measure on , but not necessarily a probability measure.
Let and be as in Theorem 4.1. Throughout this section, we will assume that is sufficiently large in comparison to so that
[TABLE]
A first consequence of this assumption is that we can apply Theorem 4.1 to get a partition of with the required properties. For , let
[TABLE]
so that in the notation of Theorem 4.1,
[TABLE]
We will fix all of the above throughout the rest of this section. The main result of the section is that can be slightly modified to make it a disjoint union of cliques. We arrive at this result in several steps. First, we show that is appropriately close to .
Lemma 6.1**.**
For each ,
[TABLE]
In particular, , where is a positive real number that depends only on and .
Proof.
By construction, for all . Thus, for any ,
[TABLE]
where the last inequality follows from (6.1). Similarly,
[TABLE]
Assume that (which we can, by our stated convention that can be taken to be less than any universal constant). Since , this completes the proof. ∎
Next, we prove two key lemmas. The first one shows that for any regular pair , is either close to zero or close to one.
Lemma 6.2**.**
There exists a number depending only on , and , such that if , then the following holds. If is an -regular pair, and , then .
The plan of the proof is roughly as follows (see Figure 3 for a schematic representation). We will first find some that connects to a substantial fraction of points in , where “substantial” means a set of -measure greater than for some universal constant . Call this set . By regularity, the edge density between and will be substantial. This will allow us to find which connects to a substantial fraction of points in . Call this set . Now take any and . Since is a neighbor of and is also a neighbor of , the small hyperbolicity of will allow us to conclude that it is highly likely that is a neighbor of . But if that happens, then since is a neighbor of and is also a neighbor of , it is highly likely that is a neighbor of . From this, we will conclude that the edge density between and is close to . Since these sets have substantial size, regularity of will imply that is close to .
Proof of Lemma 6.2.
Throughout this proof, denotes any positive real number that depends only on and . The value of may change from line to line. For , let denote the neighborhood of in . Let for each . Let and be as in the statement of the lemma. Since , we have , and so there is some for which
[TABLE]
By -regularity,
[TABLE]
and therefore
[TABLE]
Now notice that
[TABLE]
Since , . Thus
[TABLE]
so that by (6.3),
[TABLE]
By Lemma 6.1 and the inequality (6.2),
[TABLE]
Combining this with (6.4), we get
[TABLE]
If is sufficiently small (depending on , and ), the quantity in brackets on the left is bounded below by , and so there is such that
[TABLE]
Recalling (6.2), we see that by -regularity,
[TABLE]
The quantity can be bounded from below as follows:
[TABLE]
We wish to show that the right side is close to . For that purpose, we write the right side as , where
[TABLE]
and
[TABLE]
We will now show that and are small. (To understand heuristically why they should be small, recall Figure 3.) Recalling the definition of , we see that
[TABLE]
But if , then is a neighbor of in and so . Thus the above display can be simplified to
[TABLE]
Moreover, recalling that , so that , and recalling the definition of , it is easy to see that
[TABLE]
Thus,
[TABLE]
By (6.2) and (6.5), and are both bounded below by . Since , Lemma 5.4 gives
[TABLE]
On the other hand, since ,
[TABLE]
Combining all of the above observations, we get
[TABLE]
If is small enough (depending on , and ), the above quantity is smaller than . For , we re-use (6.7) to get
[TABLE]
Again, this is smaller than if is small enough. Thus,
[TABLE]
and hence by (6.6), . ∎
Our second key lemma shows that the property of high density between regular pairs has a certain transitivity property.
Lemma 6.3**.**
There exists a number depending only on , and , such that if , then the following holds. Suppose that is an -regular pair. Suppose that are distinct elements of such that , , for each , and . Then .
The proof of this lemma is intuitively quite simple, given that we already have Lemma 6.2. The small hyperbolicity ensures that if we have a path in that is not too long, then it is likely that the beginning and ending points of the path are connected by an edge. This allows us to conclude that is close to , as long as is not too large. In particular, . But then Lemma 6.2 implies that .
Proof of Lemma 6.3.
Take any sequence of points , , such that for each , , and . Let be the set of all such sequences ( is allowed to be empty). Since , then there is a minimum such that . But . Thus, , and hence . But we also know that . Therefore, , where is the set defined in (5.1). Since and , this implies that
[TABLE]
On the other hand, let . Then
[TABLE]
But by (6.8),
[TABLE]
Combining the last two displays, we get
[TABLE]
By Lemma 6.1, this shows that if is sufficiently small (depending on , and ), then
[TABLE]
But then by Lemma 6.2 (assuming that is sufficiently small), this gives . ∎
We now begin the main quest of this section, namely, to show that a small fraction of the edges of can be modified to transform it into a disjoint union of cliques. Throughout the rest of this section, we will assume that:
[TABLE]
First, we define a graph structure on . We will say that there is an edge between and if is -regular and . In this case we will say that and are neighbors. A subset of will be called a “neighborhood” if there is some such that all other elements of are neighbors of . In this case we will say that is a neighborhood of . Note that need not contain all the neighbors of . Let be a maximal collection of disjoint neighborhoods such that each neighborhood has size . Note that is allowed to be empty, in case there is no neighborhood of size .
Lemma 6.4**.**
For any distinct , there is some and such that is an -regular pair.
Proof.
Since and are both , there are at least pairs such that and . Since the number of irregular pairs is at most , this shows that at least one of the above pairs must be -regular. ∎
Now define a graph structure on as follows. Say that two neighborhoods are connected by an edge if there exists and such that and are neighbors (in the sense defined above).
Lemma 6.5**.**
Under the graph structure defined above, is a disjoint union of cliques.
Proof.
For distinct , we have to show that if is a neighbor of , and is a neighbor of , then is a neighbor of . This will imply that is a disjoint union of cliques.
Accordingly, let and be neighbors, and let and be neighbors. By Lemma 6.4, there is an -regular pair such that and . Suppose that is a neighborhood of , for . Then the sequence is a path in the graph defined on (see Figure 4). Since is -regular, Lemma 6.3 implies that . In other words, and are neighbors. Thus, is a neighbor of . ∎
Take each clique in , and take the union of its elements. This yields a new collection of disjoint subsets of .
Lemma 6.6**.**
We have .
Proof.
Simply note that each has size at least , these sets are disjoint, and their union is a subset of . Thus, . ∎
Lemma 6.7**.**
If and for two distinct elements and of , then and are not neighbors. On the other hand, if for some , then either is an irregular pair, or and are neighbors. Moreover, in this case even if is irregular, there is a path with vertices joining and .
Proof.
If and for two distinct elements and of , it follows directly from the definition of that and cannot be neighbors. Next, suppose that for some , and is -regular. Then either for some , or and for some that are neighbors. In the first case, suppose that is a neighborhood of some . Then is a path, and hence by Lemma 6.3, is a neighbor of . In the second case, suppose that is a neighborhood of and is a neighborhood of . Since and are neighbors, there exist and which are neighbors. Then is a path, and hence by Lemma 6.3, and are neighbors. This argument also establishes that even if is an irregular pair, we can find a path with vertices joining and . ∎
Next, let be the set of all that are not elements of any .
Lemma 6.8**.**
For any , there are less than many that are neighbors of .
Proof.
Suppose that there is some that has neighbors in . Then there is a neighborhood of size . But this neighborhood is disjoint from all the neighborhoods in . This contradicts the maximality of . ∎
Lemma 6.9**.**
Suppose that and are such that has at least neighbors in . Then has less than neighbors in the union of all members of other than .
Proof.
Let be the set of all neighbors of in , and let be the set of all neighbors of in the union of all elements of other than . By assumption, . If also , then there are pairs such that and . Therefore at least one such pair must be -regular. Since is a path, Lemma 6.3 shows that and are neighbors. But this contradicts the first assertion of Lemma 6.7. ∎
For each , let be the superset of consisting of all elements of and all elements of that have neighbors in . Let be the set of all such . Lemma 6.9 shows for any , there can be at most one such that has neighbors in . Thus, the elements of are disjoint. Let be the set of all elements of that do not belong to any . A schematic picture depicting and is given in Figure 5.
Lemma 6.10**.**
For any , the set has the property that any two distinct elements of are either neighbors, or an irregular pair.
Proof.
Take any distinct such that is an -regular pair. If they are both in , then the assertion is proved by Lemma 6.7.
If and , then has a neighbor . By Lemma 6.7, there is a path with vertices joining and . Since and are neighbors, we can concatenate at the beginning of this path to get a path with vertices joining and . Therefore by Lemma 6.3, and are neighbors.
Lastly, if and are both in , then they have neighbors and in . By Lemma 6.7, there is a path with vertices joining and . Since and are neighbors, and and are neighbors, we can concatenate at the beginning of the path and to the end of the path to get a path with vertices joining and . Therefore by Lemma 6.3, and are neighbors. ∎
Call a pair “bad” if and are neighbors, but they belong to distinct elements of .
Lemma 6.11**.**
The number of bad pairs is at most .
Proof.
Let be a bad pair. We consider several cases. First, by Lemma 6.7, it cannot be that both and are in the complement of .
Next, suppose that and . Then for some and for some . By Lemma 6.9, there are less than neighbors of in . By Lemma 6.6, there are at most choices of . Thus, there are at most choices of for this , and therefore at most choices of of this type.
Finally, suppose that both . Then by Lemma 6.8, there are less than choices of for each . Thus, there are at most pairs of this type. ∎
Lemma 6.12**.**
Any element of has at most neighbors among .
Proof.
Take any and any neighbor of . Then by Lemma 6.8, there are less than choices of . On the other hand, by definition of , has less than neighbors in each . Thus, by Lemma 6.6, there are at most choices of such . Since any neighbor of is either in or in for some , this completes the proof. ∎
We finally arrive at the main result of this section, which says that the graph can be modified into a disjoint union of cliques by adding and deleting a set of edges that has small -measure.
Lemma 6.13**.**
Under the assumptions (6.1) and (6.9), the graph can be modified into a disjoint union of cliques by adding and deleting edges in such a way that if is the set of all edges that were added or deleted, then
[TABLE]
where is a universal constant. Moreover, any non-singleton clique in the resulting graph has
[TABLE]
Proof.
Edges are added and deleted in several steps. First, delete all edges with at least one endpoint in . Let be the set of deleted edges. Then clearly
[TABLE]
Next, add all edges between vertices within the same , . Let be the set of all edges added in this step. Then by Lemma 6.1,
[TABLE]
In the next step, add all missing edges between any and that are members of the same . By Lemma 6.10, such pairs are either irregular, or they are neighbors of each other. In the latter case, the total mass of the missing edges is at most . Thus, if is the set of edges added in this step, then by Lemma 6.1,
[TABLE]
Next, delete all edges between any and where . Then is either an irregular pair, or is regular but and are not neighbors, or is a bad pair. Thus, if is the set of edges added in this step, then by Lemma 6.2, Lemma 6.11 and Lemma 6.1,
[TABLE]
Finally, delete all edges with at least one vertex in some . Let be the set of deleted edges. Given and any , by Lemma 6.12 there are at most choices of such that is a neighbor of . The other possibilities are that is an irregular pair, or is regular but is not a neighbor of , or . Therefore by Lemma 6.2 and Lemma 6.1,
[TABLE]
This completes the process of adding and deleting edges. If is the set of all edges that were either added or deleted, then the above estimates show that (6.10) holds.
Let us now verify that the resulting graph is a disjoint union of cliques. For each , let be the union of all . In the new graph, each is a clique, and there are no edges between two such cliques. Moreover, any vertex that belongs to some has no edges incident to it in the new graph. Thus, the new graph is the disjoint union of the above cliques and a bunch of singleton vertices that are disconnected from all else. This also shows that any non-singleton clique in the new graph must be one of the ’s. But for any , Lemma 6.1 gives
[TABLE]
This completes the proof. ∎
7. Constructing the tree
Let , , , , , , and remain as defined in Section 5. We will now repeatedly apply Lemma 6.13 to extract from a nested hierarchy of subsets with desirable properties. The subsets will be constructed in such a way that each subset is either a singleton, or has -measure uniformly bounded below by a positive constant that depends only on and . Any such constant will henceforth be denoted by . This will allow us to apply Lemma 6.13 to partition such a non-singleton subset if and are small enough, depending only on and . We will keep dividing the non-singleton subsets until we are left with only singletons.
Henceforth, whenever we say “ and are small enough”, we will mean “ and are smaller than constants depending only on and ”.
Let . By Lemma 5.3, if is small enough. Define a graph on as in the beginning of Section 6, using , and obtain a partition of using Lemma 6.13. Obtain a partition of by taking this partition of and appending to it singleton sets consisting of the elements of . Let denote this partition. By (6.11), any non-singleton element does not intersect and satisfies . Thus we can apply Lemma 6.13 to any such with , if and are small enough. In this manner, we obtain a collection of disjoint sets, each of which is a subset of some non-singleton element of . Then we partition each non-singleton element of by applying the procedure of Section 6 with to obtain , and continue this recursive partitioning until we arrive at . This is possible since , which, by (6.11), ensures that the conditions (6.1) and (6.9) are never violated if and are small enough.
Having defined , define to be the set of all singleton sets such that belongs to some non-singleton member of . Note, in particular, that we are not applying Lemma 6.13 while partitioning the elements of into singletons. Lastly, define .
Let be the set of all pairs where and . This is sort of like the union of the ’s, except that we pair each element with the corresponding to deal with the problem of the same appearing in two different ’s (which can happen if some is partitioned into just one set in some step). For simplicity, we will refer to the element as just .
We will now define a tree structure on . Note that by construction, if an element belongs to some , , then it has a uniquely defined parent . Putting edges between such parent-child pairs creates a graph which is obviously a tree. Also, it is clear that the set of leaves of this tree can be identified with . Define to be the root of .
For each non-singleton node for , let be the set of edges of that need to be modified while applying Lemma 6.13 to convert into a disjoint union of cliques. If is a singleton set, let be empty. Let be the set of edges that need to be modified while applying Lemma 6.13 to . Lastly, let be the set of all pairs with at least one of and in . Let be the union of all these sets.
We prove three lemmas in this section. In all of these, we assume that and are sufficiently small, depending on and , so that Lemma 6.13 can be applied. We will view the elements of as the leaves of , and for any , we will denote by the Gromov product of and under the graph distance on , with respect to the base point .
Lemma 7.1**.**
For the set defined above, we have
[TABLE]
where is a universal constant.
Proof.
Note that by Lemma 6.13 and Lemma 5.3,
[TABLE]
Since each is a partition of a subset of ,
[TABLE]
Therefore, since by the definition of , we get
[TABLE]
By the definition (5.2) of , this gives the desired result. ∎
Lemma 7.2**.**
For any such that ,
[TABLE]
Proof.
Let , so that is the largest integer such that and both belong to the same member of . First, suppose that and . Let be the element of that contains and . Then while applying Lemma 6.13 to , there is an edge between and in the original graph, but that edge is deleted in the modification. Thus, , which is not true by assumption. Therefore must be less than .
If , then also the above deduction holds: If and and are both in , then by the same logic as above we conclude that . On the other hand, if and at least one of and is outside , then .
Combining the above observations, and recalling the bound (5.3), we get that if , then
[TABLE]
If , then note that since (by the definition of ),
[TABLE]
Finally, note that since , we cannot have . ∎
Lemma 7.3**.**
For any such that ,
[TABLE]
Proof.
As in the proof of Lemma 7.2, let , and note that since , we must have . First, suppose that and . We know that and are both in some . Let be the parent of in . Then while applying Lemma 6.13 to , is not an edge in the original graph, but since and both belong to , must be an edge in the modified graph. Thus, , which is false by assumption. Consequently, .
If and , then either and are both in , in which case the same argument shows that , or at least one of and is in , in which case .
Combining, and applying (5.3), we get that if , then
[TABLE]
Lastly, if , note that the inequality is automatic since . This completes the proof of the lemma. ∎
8. Completing the proof of Theorem 5.1
Take any . We have to prove the existence of a , depending only on , such that if and , then . To do this, first choose so small and so large that
[TABLE]
where is the universal constant from Lemma 7.1, and also
[TABLE]
Let , and let . If and are small enough (depending on and ), then the method of Section 7 yields and satisfying the conclusions of Lemmas 7.1, 7.2 and 7.3. Recall also that and for all and . Consequently, if and are i.i.d. draws from , then
[TABLE]
This shows that if and are small enough, depending on , then .
9. From Theorem 5.1 to Theorem 2.4
In this section we prove Theorem 2.4 using Theorem 5.1. Initially, let us continue working under the assumption that is finite and is the power set of . Take any . Then by Theorem 5.1, there is some such that if and , then . Suppose that . Then we first create a new system where this violation does not happen. Take each divide it up into vertices, where is chosen so large that . Let be the new set of vertices, consisting of copies of each . Let be a map from into that takes any copy of to , so that . Define a probability measure on as
[TABLE]
The probability measure can be described in words as follows. Drawing a vertex from is the same as first picking a vertex from , and then choosing one of its copies in uniformly at random. Note that if , then .
Define also a similarity function on as
[TABLE]
Then by the observations from the previous paragraph, it follows that
[TABLE]
where is the power set of . On the other hand by construction. Thus, by Theorem 5.1,
[TABLE]
Consequently, there exists a tree that is compatible with (in the sense of Definition 2.2), with root , and a number such that
[TABLE]
where and are i.i.d. draws from , and is the Gromov product of and under the graph distance on , with respect to the base point .
Now, for each , let be a vertex chosen uniformly at random from . Modify the tree by deleting all leaves other than the ’s, and also deleting the edges joining these leaves to their parents. The resulting graph is still a tree, and its leaves are in one-to-one correspondence with the set . Thus we can relabel its leaves to define a tree with set of leaves and root .
Let and be i.i.d. draws from , independent of . Then and are i.i.d. draws from , and hence by (9.1),
[TABLE]
But , and by our definition of ,
[TABLE]
Therefore , where the Gromov product on the left is on the tree , and the Gromov product on the right is on the tree . This gives
[TABLE]
where the expectation is now taken over , and . Since is independent of and , this proves the existence of a tree with set of leaves and root , such that
[TABLE]
Thus, we may conclude that . This completes the proof of Theorem 2.4 under the assumptions that is finite and is the power set of .
Let us now consider general , where is countably generated. Take any . The case of finite gives a corresponding to . Take this , and suppose that
[TABLE]
We will show that in the general case, this implies .
Let be a set of generators of . For each , let be the partition of generated by . Let be the set of all sets of the form where . Let be the set of subsets of that are unions of elements of . Define
[TABLE]
It is not difficult to show that is an algebra of sets that generates the -algebra on . Now take any . For , let
[TABLE]
By the measurability of , . Therefore by a basic result of measure theory, given any there exists such that . Define
[TABLE]
so that .
Since is an increasing sequence, there is some large enough such that for all . Define a function as where is a smallest number such that . If there is no such , let . Since each is a union of members of , it follows that is constant on each element of .
Now suppose that , but . Then there are two possibilities: (a) . Then clearly, . (b) . In this case, must be zero and must not belong to any . But for some . Thus again, .
On the other hand, suppose that but . Again, this implies that either is not in any , or for some . In the first case, we clearly have . In the second, and hence .
Combining the observations of the last two paragraphs, we see that if , then . Thus, if and are i.i.d. draws from , then
[TABLE]
Now recall the assumption (9.2) and the fact that is a function of . Therefore, the above display shows that by choosing large enough (depending on ), and then choosing small enough (depending on and ), we can ensure that
[TABLE]
Now let be the element of that contains and let be the element of that contains . Since is a finite set, we can endow it with its power set -algebra (which identifies with ), and may consider and to be -valued random variables. Then and are i.i.d. random variables with law , where identifies with the restriction of to . Since is constant on elements of , we can naturally view as a function on . Lastly, observe that . Combining all of these observations, we get
[TABLE]
Since has finite cardinality, this implies that
[TABLE]
In particular, there is a tree with root that is compatible with , and a number , such that
[TABLE]
where is the Gromov product of and under the graph distance on , with respect to the base point . Let us now extend the tree by appending to the set of nodes, and adding an edge between each and the element of that contains . Call the new tree . Then is the set of leaves of . The set is just , which is finite. Lastly, for any , the set of leaves that are descendants of is a union of elements of , and therefore measurable. Thus, is compatible with .
Next, note that , because if is the graph distance on , then , , and . Also, we know that . Therefore by (9.4),
[TABLE]
Invoking (9.3), this shows that if is chosen large enough (depending on ), and then is chosen small enough (depending on and ), we can ensure that
[TABLE]
Consequently, , completing the proof of Theorem 2.4.
10. Proof of Theorem 3.1
Take any strictly increasing continuous function , and define the similarity function
[TABLE]
If three configurations , and satisfy
[TABLE]
for some , then by the monotonicity and uniform continuity of on the range of ,
[TABLE]
where as . From this and the boundedness of on the range of , we see that if (3.2) holds, then
[TABLE]
Consequently, in probability as , where is the power set of if and the Borel -algebra of if . Thus, Theorem 2.4 implies that
[TABLE]
Therefore, there are sequences and tending to zero as , such that the following holds. With probability at least , there exists a tree with root , that is compatible with in the sense of Definition 2.2, and a number , satisfying
[TABLE]
where is the Gromov product under graph distance on the tree , with respect to the base point .
By the remark immediately below Definition 2.2, the nodes of give a hierarchical clustering of into measurable clusters. For each node , let , where is the length of path from to . If is the smallest cluster containing and , then . Therefore if , then . This completes the proof.
Acknowledgements
We thank Sky Cao, Wei-Kuo Chen, Persi Diaconis, Jacob Fox, Susan Holmes and Dmitry Panchenko for helpful comments and references.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aizenman and Contucci [1998] Aizenman, M. and Contucci, P. (1998). On the stability of the quenched state in mean-field spin-glass models. J. Statist. Phys., 92 no. 5-6, 765–783.
- 2Albert, Das Gupta and Mobasheri [2014] Albert, R., Das Gupta, B. and Mobasheri, N. (2014). Topological implications of negative curvature for biological and social networks. Phys. Rev. E, 89 no. 3, 032811.
- 3Alon, Coja-Oghlan, Hàn, Kang, Rödl and Schacht [2010] Alon, N., Coja-Oghlan, A., Hàn, H., Kang, M., Rödl, V. and Schacht, M. (2010). Quasi-randomness and algorithmic regularity for graphs with general degree distributions. SIAM J. Comput., 39 no. 6, 2336–2362.
- 4Arguin and Aizenman [2009] Arguin, L.-P. and Aizenman, M. (2009). On the structure of quasi-stationary competing particles systems. Ann. Probab., 37 no. 3, 1080–1113.
- 5Auffinger and Ben Arous [2013] Auffinger, A. and Ben Arous, G. (2013). Complexity of random smooth functions on the high-dimensional sphere. Ann. Probab., 41 no. 6, 4214–4247.
- 6Auffinger, Ben Arous and Černý [2013] Auffinger, A., Ben Arous, G. and Černý, J. (2013). Random matrices and complexity of spin glasses. Comm. Pure Appl. Math., 66 no. 2, 165–201.
- 7Auffinger and Chen [2018] Auffinger, A. and Chen, W.-K. (2018). On the energy landscape of spherical spin glasses. Adv. Math., 330 , 553–588.
- 8Bowditch [2006] Bowditch, B. H. (2006). A course on geometric group theory. Math. Soc. Japan, Tokyo.
