Probabilistic and Geometrical Applications to Graph Theory
Matthew Yancey

TL;DR
This paper explores the relationship between graph isoperimetric properties and Lipschitz functions, resolving conjectures, and extends discrete geometric inequalities by analyzing extremal functions and measure concentration in hypercubes.
Contribution
It characterizes extremal Lipschitz functions related to isoperimetric inequalities, resolves key conjectures, and advances the understanding of discrete curvature and midpoint bounds in hypercubes.
Findings
Resolved conjecture on extremal functions of the subgaussian inequality for odd cycles.
Linked maximum variance functions to the isoperimetric function of product graphs.
Disproved a proposed method for bounding t-midpoints in discrete hypercubes.
Abstract
This paper consists of two halves. In the first half of the paper, we consider real-valued functions whose domain is the vertex set of a graph and that are Lipschitz with respect to the graph distance. By placing a uniform distribution on the vertex set, we treat as a random variable. We investigate the link between the isoperimetric function of and the functions that have maximum variance or meet the bound established by the subgaussian inequality. We present several results describing the extremal functions, and use those results to resolve: (A) a conjecture by Bobkov, Houdr\'e, and Tetali characterizing the extremal functions of the subgaussian inequality of the odd cycle, and (B) a conjecture by Alon, Boppana, and Spencer on the relationship between maximum variance functions and the isoperimetric function of product graphs. While establishing a discrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Geometric Analysis and Curvature Flows · Topological and Geometric Data Analysis
Probabilistic and Geometrical Applications to Graph Theory
Matthew P. Yancey Institute for Defense Analyses / Center for Computing Sciences (IDA / CCS), [email protected]
Abstract
This paper consists of two halves.
In the first half of the paper, we consider real-valued functions whose domain is the vertex set of a graph and that are Lipschitz with respect to the graph distance. By placing a uniform distribution on the vertex set, we treat as a random variable. We investigate the link between the isoperimetric function of and the functions that have maximum variance or meet the bound established by the subgaussian inequality. We present several results describing the extremal functions, and use those results to resolve: (A) a conjecture by Bobkov, Houdré, and Tetali characterizing the extremal functions of the subgaussian inequality of the odd cycle, and (B) a conjecture by Alon, Boppana, and Spencer on the relationship between maximum variance functions and the isoperimetric function of product graphs.
While establishing a discrete analogue of the curved Brunn-Minkowski inequality for the discrete hypercube, Ollivier and Villani suggested several avenues for research. We resolve them in second half of the paper as follows.
- •
They propose that a bound on -midpoints can be obtained by repeated application of the bound on midpoints, if the original sets are convex. We construct a specific example where this reasoning fails, and then prove our construction is general by characterizing the convex sets in the discrete hypercube.
- •
A second proposed technique to bound -midpoints involves new results in concentration of measure. We follow through on this proposal, with heavy use on results from the first half of the paper.
- •
We show that the curvature of the discrete hypercube is not positive or zero.
1 Motivation
This manuscript deals with graphs, and the attempts to apply alternative areas of mathematics to them.
The first half is motivated by concentration of measure. There is a canonical method to construct a Martingle by iteratively selecting a random variable that is defined as a function that is Lipschitz over a graph. The isoperimetric function of the graph has been linked to the extremal variables. We will present results describing the extremal variables, which will allow us to refine our knowledge about this link.
The second half is motivated by geometry. We recently investigated the relationship between negative curvature and congestion in transportation networks [38]. We currently are interested in showing that networks exhibiting qualities associated with positively curved spaces will consequently have many routing options for transportation between locations. In this paper, we use an abundance of midpoints as a proxy for an abundance of routing options. Our previous work had the advantage of a discrete analogue of negative curvature (Gromov’s -points hyperbolicity) that is well-studied [11, 4, 23], practical [19, 12, 13, 14], and consequential [36, 27, 26, 1]. Multiple discrete analogues of positive curvature have been proposed [20, 30, 31, 10, 9, 22], and some consequences of those notions are known [33, 15, 20, 5]. We will present results about the discrete analogue of the curved Brunn-Minkowski inequality.
1.1 Background: concentration of measure
For graphs , the Cartesian product is the graph with vertex set such that the distance between vertices and is . The edges of are the vertices that are distance apart. We denote the Cartesian product with as .
For a fixed graph and vertex set , let . The isoperimetric function is . A problem considered by several authors [2, 6, 35] is the isoperimetric function of product spaces. That is, we generalize the isoperimetric function from to as .
One method to analyze the isoperimetric function of product spaces is to study probability spaces defined as uniform probabilities over the vertex set of a graph equipped with functions that are Lipschitz with respect to the standard graph distance in . By placing a uniform distribution on , we abuse notation and treat as a random variable. We use the notation from [6] that , where is taken over Lipschitz functions on . This notation is not consistent with [2, 35]. We will call a function variance-optimal if is Lipschitz and . Alon, Boppana, and Spencer [2] proved that decays exponentially as grows when with a rate that relies on . Let be the median value of , and note that if is variance-optimal, then so is and for all .
Theorem 1.1** ([2]).**
Let . We have that . Let . If is variance-optimal and , then and .
They also conjectured a stronger relationship between and —that the extremal set is determined by variance-optimal functions.
Conjecture 1.2** ([2], page 416).**
Let be a variance-optimal function over . Is it true for sufficiently large and in appropriate ranges that for all with ?
Our initial intuition was that the conjecture may be true. A variable that maximizes variance will attempt to evenly spread values towards and as possible. Because is Lipschitz we have that . The hope is that is relatively large.
If true, the conjecture would have significance. Let be variance-optimal over respectively. Let , and define variable over as . Under these conditions, is variance-optimal and . If true, the conjecture would thus imply that the isoperimetric function of and the associated extremal vertex sets can be determined by the isoperimetric function of and the associated extremal vertex sets.
The conjecture has been proven true for the discrete hypercube by Harper [25], the Euclidean lattice by Bollobás and Leader [7], and the discrete torus by Bollobás and Leader [8].
Our initial set of results are a series of statements that describe the variance-optimal functions of a graph. In Section 2.5 we characterize the set of variance-optimal functions for three families of graphs. Each of those families consists of trees with long paths, and our most useful statement involves hairs. A hair of is a sequence of vertices such that for and .
Lemma 2.10, (9), Remark 2.11, and Remark 2.12 * Let be a variance-optimal function over with hair . The sequence of values is unimodular. Let and . If has vertices such that , , and , then the sequence is monotone. *
We compare this to the structure result of Alon, Boppana, and Spencer.
Theorem 1.3** ([2]).**
Let be a variance-optimal function over , and define . Let for some , and let be the connected components of . Under these conditions, there exists variables such that for we have that .
The first two examples in Section 2.5 are an exploration of the assumption in Theorem 1.1 that is absent from Conjecture 1.2. We determine a bound on the isoperimetric number of the third example, which leads to the following result.
Theorem 2.19 * Conjecture 1.2 is not true. *
We conjecture a simple characterization of variance-optimal functions over trees; we are proposing that is not the correct center of the tree conceptually. If true, the statement would strengthen Lemma 2.10 significantly.
Conjecture 1.4**.**
Let be a variance-optimal function over a tree . There exists a vertex such that for all vertices , we have that . Moreover, if is not a path, then we may choose such that .
The range is necessary in Theorem 1.1, as Alon, Boppana, and Spencer work with a different tool from probability when for some constant . Specifically, they work with the subgaussian inequality, which states that
[TABLE]
The subgaussian constant for is a value for such that (1) is true for all real . For vertex set , we define Lipschitz function . In particular, they showed that when .
For graph , let be the supremum of for functions that are Lipschitz over . We call a function over graph optimal if . When it is clear, we drop the subscript from .
Using the same construction as above, Alon, Boppana, and Spencer [2] showed that . Bobkov, Houdré, and Tetali [6] showed that if and is even, then . If is odd, then . Exact values for are also known for paths and cycles of even length [35]. Bobkov, Houdré, and Tetali conjectured a characterization of the optimal functions for odd cycles [6], which was repeated by Sammer and Tetali [35].
Conjecture 1.5** ([6]).**
If is an optimal variable over the cycle , then there exists an such that for all .
We present several structural statements about optimal variables, such as the analogue of Lemma 2.10. Theorem 1.3 does not hold for optimal variables, but it does hold for “half” of the graph.
Corollary 2.15 * If is an optimal function and , then . *
Once again we are able to use our statements describing optimal functions to characterize the optimal functions of specific examples.
Theorem 2.9 * Conjecture 1.5 is true. *
All of our discussion so far generalizes in the obvious way to metric spaces with finite number of elements. An example is the symmetric group on elements equipped with the Hamming distance, which is . Bobkov, Houdré, and Tetali [6] bounded .
Theorem 1.6** ([6]).**
Let be the symmetric group on elements equipped with the Hamming distance . The subgaussian constant for this space satisfies .
We are interested in Olivier and Villani’s [32] use of concentration inequalities. Their concentration inequalities are applied to functions whose domain is a subset of the hypercube with distance, and thus standard statements about the hypercube will not apply. Fortunately, these subsets are closed under the group action by the symmetric group. Using this property, concentration inequalities for this unusual domain can be established (with small, but non-trivial work) from concentration inequalities on functions of the symmetric group equipped with Hamming distance.
Thus, we are interested in the bounds in Theorem 1.6. In Remark 2.3 we show that the upper bound is not sharp, although our improvement is negligible. In Theorem 2.4 we show that .
Let and , equipped with a distance metric equal to the order of the symmetric difference. Ollivier and Villani [32] established that and , which we wish to generalize.
*Theorem 2.5 and Theorem 2.6 ** * For , we have that . Additionally suppose , , and for all . Let be a Lipschitz function over , and let be induced on .
[TABLE]
1.2 Background: graph curvature
For sets and , let and . Understanding the extremal properties of is a central goal in algebraic combinatorics. For example, Roth’s theorem states that when and is disjoint from , then has density [math]. Let .
One method to bound the size of comes from the Brunn-Minkowski inequality, which states that where denotes the volume of measurable set . Using that and that for positive , the Brunn-Minkowski inequality transforms into In other words, the volume of the midpoints between and is at least the geometric average of and . This statement can be generalized to weighted geometric averages: if , define , and we have that (see [10])
We define the distance between sets to be . For , the value has no affect on midpoints, as . However, for smooth complete manifolds with positive curvature , the Brunn-Minkowski inequality strengthens (see [32]) exponentially as grows: Should a value hold for some space, we will call the supremum of such values the Brunn-Minkowski curvature.
There have been several attempts to generalize the Brunn-Minkowski inequality to discrete spaces such as . These efforts have met several obstacles. The most obvious obstacle is that the volume function naturally voids sets whose dimension is less than the overall space. The discrete analogue of volume is to count points, but this function does not naturally void sets with smaller dimensionality. As such, the strongest statement possible is that , and this is sharp for any dimension.
Some progress can be made when we force the sets and to live in higher dimensions. Ruzsa [34] proved that for with and , then . Gardner and Gronchi [21] proved that for with and , then .
We consider the question posed verbally by Stroock and written down by Ollivier and Villani (see [32]), “what is the curvature of the discrete hypercube?” We consider the more general question of what graphs display the qualities of a curved space. We limit ourselves here to discrete analogues of positive curvature; see [38] for a discussion on discrete analogues of negative curvature. Proposed notions of positive curvature for graphs include coarse Ricci curvature [30, 31], Bakry-Emery version of Ricci curvature [5], dispersion of heat [20], displacement convexity with approximate midpoints [10, 9], displacement convexity with Gaussian midpoints [22], among others. (We briefly mention that “displacement convexity with approximate midpoints” is a misleading name, as it allows for midpoints of quasi-geodesics.) We will focus on Brunn-Minkowski curvature and displacement convexity as proposed in [32].
The symmetric discrete midpoints are
[TABLE]
[TABLE]
where , and the discrete analogue of Brunn-Minkowski curvature is that
[TABLE]
Roughly speaking, displacement convexity implies that the midpoints of an optimal transportation from to should satisfy a similar inequality. There are several definitions of displacement convexity, and we will wait until Section 3.3 to give a formal definition.
The coarse Ricci curvature of the discrete hypercube in dimensions is easy to calculate as , and Ollivier and Villani [32] demonstrated that the -dimensional hypercube has Brunn-Minkowski curvature at least . But it is a coincidence that these are the same number; in Proposition 3.19 we show that there exists a such that the -dimensional hypercube has Brunn-Minkowski curvature at least when is large enough.
The discrete hypercube is unique in that it is a product space where the topologies are equivalent for all . With the following result, we show that the Brunn-Minkowski curvature found by Ollivier and Villani is due to the topology; the difficulties of establishing a Brunn-Minkowski inequality for the integer lattice gives insight into why the other product measures are insufficient. We omit the proof to Theorem 1.7 as the modifications necessary to the proof of Ollivier and Villani’s result are contained in the proof to Theorem 3.5, which we will state soon.
Theorem 1.7**.**
Any finite product space with dimension equipped with the metric has Brunn-Minkowski curvature at least .
Ollivier and Villani’s [32] work stated many open questions. The largest of which is to understand the curvature of the discrete hypercube with a discrete analogue of displacement convexity. They also desire a connection between the different discrete analogues of curvature; explicitly stating that “the relationship between coarse Ricci curvature and displacement convexity of entropy is unclear, and no implication has been proved or disproved in either direction as far as we know.”
The discrete hypercube has been singled out because authorities on the subject [24, 32] have labeled it as the best candidate for being the first proven example of a positively curved discrete space. The discrete hypercube was proven by Erbar and Maas [20] to have positive heat-dispersion-curvature. Our first result is a rejection of the connection between coarse Ricci curvature and displacement convexity, and we do so on the discrete hypercube, which additionally implies a disagreement between displacement convexity and heat dispersion.
Example 3.13 * There exists sets in the discrete hypercube such that the entropy of the midpoints is less than the entropy of the endpoints. *
Example 3.13 is specifically a reference to standard displacement convexity, which we distinguish from weak displacement convexity. Weak displacement convexity seems quite general, and it seems like it should hold for most spaces that have Brunn-Minkowski curvature (we note that half of our lemmas towards Theorem 3.18 are not specific to the hypercube). Such a result would imply positive curvature for more than just the discrete hypercube, as Neeranartvong, Novak, and Sothanaphan [29] proved that Brunn-Minkowski curvature also exists in the symmetric group. While such a result eludes us, we are able to present the following progress towards proving weak displacement convexity of the hypercube.
Theorem 3.18 * For probability distributions , over points in the -dimensional discrete hypercube, there exists an optimal transportation such that the distribution over the midpoints satisfies*
[TABLE]
The other open questions from Ollivier and Villani involve bounding for . The first approach suggested is to bound by bounding . Such a relationship would only exist in Riemannian spaces for convex and , and so Ollivier and Villani suggest the condition , as the discrete analogue of convexity. This condition is not sufficient. To fully explore this line of thinking, we characterize the family of convex sets in the discrete hypercube.
Theorem 3.2 * If is a convex subset of , then is isomorphic to for . *
Our characterization proves that Example 3.1, which has vertex sets such that , is general. This characterization also leads to interesting results that highlight how non-intuitive discrete spaces are.
Corollary 3.3 * The convex closure of any non-trivial ball in a hypercube is the whole space. *
Corollary 3.4 * If are nonempty sets of vertices in the hypercube and is the convex closure of , then . *
We are able to use our advances in concentration theory to bound when is small. Our bound still uses instead of , which implies that our technique is not preferable.
Theorem 3.5 * Suppose such that . Let be such that and . Under these conditions, for large we have that*
[TABLE]
Bounding is useful for a strong version of displacement convexity. Graphs that satisfy such a property also satisfy a condition that is similar, but stronger, than being claw-free. Unfortunately, we characterize this family of graphs in Theorem 3.20, and it is a very small family of spaces.
1.3 Common theme
The obvious connection between the two halves of this manuscript is that the second half uses a technical result from the first half. However, there is a stronger intuitive link. While most of the space of the second half of the paper is spent discussing “what does it mean to be curved?” the intent is to discuss “who is curved?” From that perspective, we propose graphs with small variance—the graphs we study in the first half of the paper—as the model for a positively curved space.
Let us quickly survey the evidence for this. Lipschitz functions frequently appear as minor statements in manuscripts about curved spaces. For example, in Particular Case 5.4 of [37], Villani presents the dual Kantorovich problem (which we elaborate on in Remark 3.12) as maximizing a variance-like parameter over the set of Lipschitz functions; and Lipschitz functions show up elsewhere in the textbook, such as in Remark 6.5. The heat dissipation approach to curvature from Erbar and Maas [20] implies a bound on the subgaussian constant.
The spread can be defined subject to the constraints and for each we have that . The Fiedler vector of the Laplacian is known to satisfy subject to the constraints and , where is the associated eigenvalue. Alon and Milman [3] were the first to observe that concentrated graphs are a generalization of expanders. Bauer, Chung, Lin, and Liu [5] showed that each of coarse Ricci curvature and Bakry-Emery version of Ricci curvature implies a bounded value for .
Thus the literature is heavy with results that indicate discrete curvature is a tool that can be used to find concentrated discrete spaces. But we know far more concentrated discrete spaces than curved discrete spaces; so can we use techniques from expanders to study curvature? We explore the prospects of such research in Section 3.6. In continuous spaces, the statements also use curvature to imply concentration and not vice-versa (see chapter 22 of [37]), but the examples contained in this manuscript suggest that any applicable discrete curvature will necessarily be weaker than the continuous analogue.
2 Concentration Of Lipschitz Functions
2.1 Background
The following is basically the details behind the relevant portions of the first page and a half of [6]. We assume all variables are real.
Recall (1). Because is real, the exponential function is positive and monotone increasing, and therefore when is nonnegative. By setting we have that (1) implies
[TABLE]
Let be the median value of , and so minimizes the function . In particular, we have that . We apply the Cauchy-Schwartz inequality (let be the possible events for variable with probability , set and ) to say that . We conclude that
[TABLE]
We may apply (2) with and use (3) to see that
[TABLE]
Claim 2.1 will transform (4) into
[TABLE]
Claim 2.1**.**
**
Proof.
Consider the expression as . Apply the Taylor sequence of to see that
[TABLE]
Because expectation is linear and , we have that . Now apply (1) to see that . By the Taylor sequence for we have that . ∎
Bobkov, Houdré, and Tetali [6] had a special interest in random variables , where is Lipschitz over some space . In particular, they were interested in bounding the size of the space . They use variable for subspaces such that , which implies that . With these choices, (5) implies that when . If we choose , then the same choices imply , and so (2) gives a better bound when : .
The subgaussian constant for is a value for such that (1) is true for all real . For a probability space , is the supremum of , where is over all Lipschitz functions over . When it is clear, we drop the subscript from . With all of these amazing inequalities, all we have left to do is find distributions and spaces with reasonable values for .
Suppose is any finite set, and our space is equipped with the Hamming distance (). If for Lipschitz function over , then McDiarmid’s inequality implies in (2). Bobkov, Houdré, and Tetali [6] showed that if is even (including the already-known case of ), then . If is odd, then . The symmetric group on elements is a subset of for . Bobkov, Houdré, and Tetali [6] showed that if our space is the symmetric group with a distance inherited from the Hamming distance on in this manner, then .
We are also interested in probability spaces defined as uniform probabilities over the vertex set of a graph equipped with functions that are Lipschitz with respect to the standard graph distance in . Alon, Boppana, and Spencer [2] demonstrated that the subgaussian constant tensors out with respect to the Cartesian product. As this result will be important to us (and because the proof is so short and elegant), we include it here for completeness.
First, let us establish notation that we will use again later. For a graph , let be the set of Lipschitz functions over . For , the log-moment function of is . The log-moment function of is . (Let us quickly explain why we use instead of : the log-moment function is invariant under translation——and therefore we may restrict ourselves to functions where for arbitrary fixed vertex . This restricted subset of is a closed bounded subset of and therefore compact.) We can thus calculate the subgaussian constant as .
Theorem 2.2** ([2]).**
For graphs and , we have that .
Proof.
Fix a .
Let be Lipschitz functions over such that respectively. Now define to be the function over , which is Lipschitz by choice of . By translation invariance, let us assume that , which implies that as well. Notice that
[TABLE]
As this holds for each , we conclude that .
Now suppose that is a Lipschitz function over such that . Let , and define a family of functions over as such that . Because is Lipschitz and expectation is linear, we have that is Lipschitz. For a fixed , we have that is constant, and therefore is also Lipschitz. The expectation of over is the expectation of over . For simplicity, we will use translation invariance of the log-moment function to assume has expectation [math], and therefore has expectation [math] as well. By definition of , it follows that has expectation [math] for each . The theorem then follows from the monotonicity of the exponential function and that for all ,
[TABLE]
∎
2.2 Concentration of permutations and levels of the Boolean lattice
We are interested in Ollivier and Villani’s [32] use of concentration inequalities. They use a result by Bobkov, Houdré, and Tetali [6] on the concentration of as a lemma, and hence we wish to establish this result with great care. The proof of this statement is omitted from the published paper, but its proof is available in an extended manuscript available on a personal website. We include the proof here for self-containment and to establish the context necessary for Remark 2.3.
**Theorem 1.6 ** ([6]) * Let be the symmetric group on elements equipped with the Hamming distance . The subgaussian constant for this space satisfies . *
Proof.
We prove this by induction on . The base case follows from . Also, except with distances doubled, and therefore . Next, we will show that .
Let , so that partition and each is isomorphic to . Recall the log-moment function and let be a function that is Lipschitz over and . Let denote restricted to the domain of . We introduce the translated log-moment function , so that . It follows that
[TABLE]
where the second-to-last part follows by induction. We will show that , which will prove the theorem.
Let be a variable such that . Note that , and therefore . So we are trying to prove that . We will show that for all we have that , and therefore , which will prove the theorem.
To show that for we establish a bijection such that for all . For a fixed , let . Let , and define , , and otherwise. ∎
Remark 2.3**.**
Theorem 1.6 is not sharp because whenever is odd. Because the proof is inductive, the bound on could be improved by the sum of over the odd . Unfortunately, , and thus this improvement is finite.
Let . Recall that Bobkov, Houdré, and Tetali [6] gave the exact subgaussian constant for the complete graph, and thus Theorem 2.2 implies that
[TABLE]
The extended manuscript of Bobkov, Houdré, and Tetali [6] includes a proof from Schechtman that by constructing a -Lipschitz function . It can be observed that is -Lipschitz, and therefore .
We know the extremal function for the log-moment function of . By the proof of Theorem 2.2 the extremal function of is the extremal function of tensored out for , and Bobkov, Houdré, and Tetali [6] gave the extremal function for . Up to symmetry, the extremal function for is then .
We can construct close lower bounds on by using similar functions. That is, we consider functions that are sum of indicator variables with balanced probabilities. Bobkov, Houdré, and Tetali [6] show that using the function , which can easily be expressed as the sum of indicator variables. Recall for that . We improve the previous lower bound on by finding indicator variables with positive covariance.
Theorem 2.4**.**
Let be the symmetric group on elements equipped with the Hamming distance . The subgaussian constant for this space satisfies .
Proof.
For a permutation , let be the indicator variable that . Bobkov, Houdré, and Tetali’s function is . They demonstrated that , and so their result follows from Theorem 2.1. Moreover, if is even, then .
Consider the function . If is even, then because is a permutation, . So when is even, , and so .
Now suppose for integer . A direct calculation gives us that and . Moreover, for , and . Therefore
[TABLE]
∎
Let and . Ollivier and Villani [32] established that and , which we wish to generalize.
Theorem 2.5**.**
For ,
Proof.
Let be a Lipschitz function on . Let , where is two points distance apart. We define a surjection as follows: and . Let such that . Because , the pre-image under of any point in is of fixed size. Therefore .
We claim the following two facts: (A) is Lipschitz over and (B) . The claim implies that , which proves the theorem.
By the definition of Cartesian product, is Lipschitz if and only if it is Lipschitz in each coordinate. Because is Lipschitz and we apply the Hamming distance to , we have that . Because , , and is Lipschitz, we have that . Thus, is Lipschitz. This proves (A).
By Theorem 2.2, . By Remark 2.3, . By the definition of subgaussian constant, . Recall that . This proves (B). ∎
Theorem 2.5 is strongest when . We will need another result for when for small . We restrict our attention to the midpoints of distant sets of vertices, and only establish an extension of (2).
In the following we will need to bound from below the value for and . The standard inequality to use is , which gives when . We need (and produce) the stronger inequality .
Theorem 2.6**.**
Suppose , , , and for all . Let be a Lipschitz function over , and let be induced on .
[TABLE]
In particular, if , then
[TABLE]
Proof.
Because , every event is also an event . If we wish to count such events, this inequality becomes
[TABLE]
By Remark 2.3, inequality (2) applied to is
[TABLE]
Let . A standard inequality tells us that , and by the assumption we have that . We will show that , which will imply that , which implies the theorem.
Because and we have that . Therefore , and because this implies
[TABLE]
∎
In the end, our concentration of measure will be used in the following manner.
Theorem 2.7**.**
Let such that and . Under these circumstances,
[TABLE]
If and with fixed coefficients that satisfy , then
[TABLE]
Proof.
Let be a variable over such that . By this construction, is Lipschitz, , and . Let be induced on . If , then apply Theorem 2.5 to inequality (2) on variable to see that . If , then apply Theorem 2.5 to inequality (2) on variable to see that . Because by assumption, the first part of the theorem follows.
The diameter of is , and so we may assume that . If , then . The second part of the theorem follows similarly, with Theorem 2.6 (specifically, the second inequality with ) replacing Theorem 2.5 and inequality (2).
∎
2.3 Concentration of cycles
A variable is called optimal if . Let be the variables in that are not a convex combination of other variables in . The log-moment function is a summation of functions that are strictly convex in each variable composed with a monotone function, and therefore the optimal functions are contained by . Bobkov, Houdré, and Tetali [6] established the following combinatorial fact. If , then
[TABLE]
The log-moment function is also symmetric, and Bobkov, Houdré, and Tetali used this to show a second combinatorial fact. If is a permutation of and , then
[TABLE]
Bobkov, Houdré, and Tetali [6] used (6) and (7) to quickly calculate the subgaussian constant of when is even. However, the proof is just short of calculating for . Up to symmetry, the cycle contains a unique spanning tree, which contains all but one edge (by symmetry, call this edge ). Let , and see that . Because is Lipschitz and , we also know that . If , then , and therefore . Moreover, if there exists three distinct vertices such that , then there exists a permutation such that and does not contain a spanning tree (thus violating (7)). Up to symmetry, there exists a unique integer-valued Lipschitz function on such that no value appears in the image more than twice and the endpoints of each edge differ by exactly ; and therefore is known, and therefore is known.
Conjecture 1.5, made by Bobkov, Houdré, and Tetali [6] and repeated by Sammer and Tetali [35], is that a similar construct is optimal for odd cycles. In order to enhance this argument for odd cycles, we will argue a stronger statement than (6) and (7). We will also use this stronger statement again at a later point in the paper. Let denote the set of permutations on and
[TABLE]
To explain in simpler terms, we have that is minus functions that are convex combinations of other Lipschitz functions, and is minus functions that are convex combinations of permutations of other Lipschitz functions (which might not be Lipschitz after the permutation is applied). We do not know apriori that is non-empty, but this follows from the following Lemma.
Lemma 2.8**.**
If is optimal, then .
Proof.
The log-moment function is well defined for all functions , not just those inside . Moreover, the log-moment function is strictly convex over this extended domain. So if is a convex combination of and , then . And by symmetry, and . ∎
Theorem 2.9**.**
Conjecture 1.5 is true.
Proof.
Let , and let the indices be taken modulo . Fix some , and let . We will show that is the set of translations and reflections of for . Suppose .
One method to characterize the translations and reflections of is as the family of Lipschitz functions that satisfy the condition that for any ,
[TABLE]
By translation invariance, let us assume that is integer valued. We will show that satisfies (8).
Let and . There exists an for a permutation of such that
(1) for we have that , and
(2) for we have that .
By construction, . By (7), is optimal, and so by (6), for for some value .
If or or , then we are done. So assume and . Consider the functions and where
(A) when and , and
(B) when and .
By construction, . Moreover, if is the permutation on that transposes with , then . By Lemma 2.8, this implies that is not optimal. ∎
2.4 The structure of the subgaussian constant and spread
Let us use Lemma 2.8 to prove a statement that will be used later. A hair of is a sequence of vertices such that for and . We may use (6) and (7) to state that
[TABLE]
That is, for hair and optimal function , there exists an such that
(A) for all , and
(B) for all .
We will need to find the set of optimum functions of a particular family of trees in a later section; for this purpose we will need to know that “small” hairs are monotone.
Lemma 2.10**.**
Let be an optimal Lipschitz variable on , and let be a hair of . Let and . If has vertices such that and , then for all .
Proof.
By translation invariance, let us assume that takes integral values. By the Lipschitz condition, for any path satisfies . We directly conclude that the sequence must contain each integer in the range . But by partitioning the hair into multiple paths, we also see that each integer in or in appears at least twice.
By symmetry, let us assume that each integer in appears at least twice. There exists a permutation such that
(A) for , and
(B) when .
Let , where is a permutation on that fixes vertices outside of the hair and permutes vertices inside the hair according to . By construction, is Lipschitz, and so (7) implies that is optimal. Any spanning tree of must contain each edge between consecutive vertices of a hair, and (6) implies that .
So each value in the range appears exactly twice, each value in the range appears exactly once, and the value appears exactly once.
Now let us assume that is unimodular but not monotone. By considering a sub-hair and applying symmetry, let us assume that for we have that . Let so that the hair is vertices . We will prove that if there exist vertices such that , then there exists Lipschitz functions and and permutations and such that . By Lemma 2.8, this will prove the second part of the theorem.
Let for vertices not in the hair, , , and for . Let for vertices not in the hair, , , and for . Now we will define bijections and on such that and . This will suffice, as are equal on all other vertices. We define
- •
for , set and ,
- •
set and ,
- •
for , set and ,
- •
set and , and
- •
set and .
∎
Remark 2.11**.**
The assumption in the second part of Lemma 2.10 can be relaxed to only assuming that and exist and satisfy and . This is because the rest of the vertices can be found on a shortest path from to .
We will call a function variance-optimal if .
Remark 2.12**.**
Let us first note that the variance function is strictly convex and symmetric. Therefore (6), (7), Lemma 2.8, (9), and Lemma 2.10 all hold when optimal is replaced with variance-optimal. The analogue of Theorem 2.2 also follows from minor modifications to the proof.
Variance also satisfies a handful of other properties. For example, variance is even, so is variance-optimal if and only if is variance-optimal. Next, we present a slightly stronger version of Theorem 1.3, where the improvement will be crucial to establishing Theorem 2.19.
Theorem 2.13**.**
If and , then there exists such that . If and , then there exists such that .
Proof.
For , define to be the variable when and . To prove the theorem, we will show that when the assumptions are violated there exists a value of such that and that is Lipschitz. Recall from Theorem 1.3 that is an integer. If and for all we have that , then is Lipschitz. If and for all we have that , then is Lipschitz.
A direct calculation gives us that , where is the probability of . Thus when , and the first part of the theorem follows. If , then , and so integrating from to gives a positive total change. Thus the second part of the theorem holds. ∎
One interpretation of Theorem 1.3 is that the intuition of Conjecture 1.5 is true for variance-optimal functions for all graphs. The intuition of extremal functions defined as the distance from some “origin” is half true for the log-moment function. That is, the analogue of Theorem 2.13 is true to one side of the origin, but because the log-moment function is not even we can not apply symmetry to the other side.
Theorem 2.14**.**
Let and an optimal Lipschitz function for . If , then there exists a such that .
Proof.
We begin with a similar set-up as Theorem 2.13. Assume that for all we have that . For , define to be the variable when and . By our assumption, is Lipschitz over . To prove the theorem, we will show that when .
Let be the probability of , and so
[TABLE]
The direct calculation gives us
[TABLE]
Because for all , we have that
[TABLE]
Combine the previous two equations with our assumption to see that
[TABLE]
On the domain we have that , so the theorem follows. ∎
Following the same proof as Theorem 1.3 (but without the symmetry), we have half of the analogous result.
Corollary 2.15**.**
If is an optimal function and , then .
To see that “half” of the result is best possible, consider the following example. Let be the graph with vertices and edges . We will compare two extremal Lipschitz functions on . Let , , , and . We have that is variance-optimal, but Mathematica was able to show that when .
We produce one more result, which will be used later. We omit the proof, as it is obvious.
Claim 2.16**.**
If , then and .
2.5 Tightness for isoperimetric inequalities
Recall that is the subset of vertices in defined as for function . When the function is implied, we drop the and set .
The isoperimetric problem for product graphs and a number is to identify a set of a least half of the vertices of such that is minimized. Alon, Boppana, and Spencer [2] proved that decays exponentially as grows when with a rate that relies on . Let us now explore Conjecture 1.2, which is that there exists a stronger relationship between and —that the extremal set can be determined by when is variance-optimal.
We will show that the conjecture is not true. The issue is that any variable is forced to represent a (potentially complex) graph to a one-dimensional space (see Theorem 2.13 and surrounding discussion). The conjecture has been seen to hold when the underlying graph has a distinctly one-dimensional topology. It even holds for graphs with multiple dimensional-topology; for example the Euclidean grid is the standard two-dimensional graph, and since the Euclidean grid is the conjecture holds because it holds for . But the conjecture begins to fail when the graph has some central point, but the rest of the graph can not be neatly labeled as “up” or “down” from that central point.
To build up an understanding that counters our original intuition that the conjecture is true, let us explore the assumption that is in Theorem 1.1 but missing from Conjecture 1.2. In these early examples we only explore how grows with for . We will do this with two examples: in the first having is the “correct thing to do,” while the opposite is true in the second example.
Our first example is the unbalanced tripod , which is one vertex attached to hairs with vertices respectively. Formally, we define
[TABLE]
and
[TABLE]
[TABLE]
We have built up enough results to determine the variance-optimal functions of this tree.
Example 2.17**.**
For sufficiently large , an extremal function for is , where , , and . In this case, and . There exists a permutation over such that and .
Proof.
Because it is significantly simpler, let us begin by proving the second part and later prove that is variance-optimal. Consider the permutation over such that , , , and . By construction, , , and .
Consider a Lipschitz variable over such that . Note that , and so
[TABLE]
By translation invariance, let us assume that . We will refer to the three hairs as the -hair, the -hair, and the -hair. By (9), each hair can be broken into one or two monotone sequences. The example can thus be analyzed by exhausting through a handful of cases. The analysis will be simpler by first showing that at most and at least hair is monotone.
By definition, if a -hair is monotone, then there exists a such that . We have that when all three hairs are monotone and . A simple case analysis shows that these values for maximize the variance when all three hairs are monotonic. Some hair must have a minimal element of , a second hair must have a maximal element of , and the third hair satisfies the assumptions of Lemma 2.10. The conclusion of Lemma 2.10 is that the third hair is monotonic.
Case 1: the -hair is monotone. By symmetry, assume and that the minimum element of over the -hair is at most the minimum element of over the -hair. By Lemma 2.10, we have that the -hair is monotone. If , then Lemma 2.10 applies to the -hair, and all hairs are monotonic. This is a contradiction, so and the -hair is not monotonic. By considering monotone sequences as the extreme values of , we see that .
If the -hair is not monotone, then Lemma 2.10 does not apply and for some we have . Also, by Theorem 2.13, there exists a with . For both of these facts to be true, it must be that there exists an such that for and for . Moreover, . But then for all we have that . So we see that
[TABLE]
[TABLE]
Case 2: the -hair is not monotone. Lemma 2.10 does not apply to the -hair, so the -hair contains a maximum or minimum element of . By symmetry, let us assume that is the maximum value of for some value of . Theorem 2.13 implies that there exists an such that . Note that for any Lipschitz variable such that , and therefore .
Some hair is monotone, so by symmetry the -hair is monotone.
Case 2.A: . Lemma 2.10 implies that the -hair is monotone. If , then the constraint that implies that and . On the other hand, implies . But then for all we have that . Using a calculation similar to the end of case 1, we see that .
So assume . The constraint that implies that and . But then
[TABLE]
Case 2.B: . It follows that . So for all , we have that . If , then for all we have that . If , then for all we have that . So we arrive at a contradiction similar to the end of case 1. ∎
Our second example is a slight modification to the unbalanced tripod.
Example 2.18**.**
Fix a large odd . Let be a single vertex with hairs: three of them have vertices and of them have vertex. As before, let be the unique vertex with degree greater than and let the three hairs have vertices ; and now denote the vertices in the short hairs as . Let be plus edges .
The variable such that satisfies , , , and is variance-optimal. As before, and . We have that . However, there exists a permutation of the vertex set such that .
Proof.
First, let us prove that is variance optimal over . By translation-invariance, let us assume . As in Example 2.17, one hair may have a maximal element of , a second hair may have a minimal element of , and Lemma 2.10 will apply to the third. Because the three long hairs have equal length, if they are monotone then they contain an extremal value. So Lemma 2.10 will apply to the , , and -hairs.
By symmetry, we can assume (by notation from Example 2.17), and so . Theorem 2.13 says that for all . We then have only two cases to check: when and when , and a direct calculation gives that is variance-optimal when .
Now notice that , so by Claim 2.16 we have that . But is Lipschitz over , so is variance optimal over .
Consider the permutation over such that , ,
- •
for (specifically note that ),
- •
for ,
- •
,
- •
, ,
- •
for , and
- •
for .
By construction and . Both sets have vertices. Moreover, we see that
- •
,
- •
,
- •
for , we have .
- •
for we have ,
- •
for we have , and
- •
for we have .
So we have that for all , and strict containment for . ∎
Our third example is similar to Examples 2.17 and 2.18; it is a tree whose main tomography is two long paths whose endpoints are attached to a central vertex. Similar to before, we will establish a permutation on our counterexample graph with variance-optimal function such that . The improvement of this example over Examples 2.17 and 2.18 is that this relation will hold for all and , which implies that this relation holds when is tensored into a higher dimension as a permutation over .
One distinction between and Examples 2.17 and 2.18 is that the two long paths are not hairs, but instead have many hairs of length attached to them. In Theorem 2.19 we only present , but the result holds with greater discrepancy between and for , where is the length of the paths.
Theorem 2.19**.**
Conjecture 1.2 is not true.
Proof.
We consider the even-length caterpillar with one leg per body segment. Formally, this graph is a path and a set of leaves such that . When drawn, this graph resembles a hair comb. Let denote this graph.
Suppose we have some Lipschitz variable on . Let be a permutation on such that . Let be such that and . Note that is also Lipschitz. Moreover, if is variance-optimal, then for every we have that . Because is variance-optimal and translation invariant, we may then assume that . Theorem 2.13 implies that if and otherwise.
Let us write out in full detail the ordering on imposed by . We do this by levels, where level of function is the vertex set . The levels of are from [math] to and they compose of
- •
level [math] is ,
- •
level for is ,
- •
level is ,
- •
level is ,
- •
level for is , and
- •
level is .
Let us propose a different ordering of . Let us call this ordering and the levels of are composed of
- •
levels [math] to are the same as for ,
- •
level is ,
- •
level for is vertices , and
- •
level is .
The levels have the same number of vertices for and for .
We claim that is a better ordering than . Unfortunately is not a -Lipschitz function, as and . However, the set is still a well-defined object. We also have that .
If , then as the levels from [math] to are defined to be the same. If , then . If and , then . If and , then .
There exists a permutation over such that and with strict containment when and . So if is a permutation ov such that is applied to each coordinate, it follows that , also with strict containment when and (recall that ). ∎
3 Discrete Positive Curvature
3.1 Convex sets and iterated midpoints
The notion of “convex” is undefined for discrete spaces, but Ollivier and Villani define it for this context to be the property that . We will first give an example of convex sets in the hypercube where is much larger than . It will be clear how this example generalizes to larger dimensions. Then we will prove that our examples of are typical of all convex sets in the hypercube.
This section will be easier if we use the notation of the Boolean lattice, which is equivalent to the hypercube. That is, the points of are represented as the subsets of and the distance between two points is the order of their symmetric difference.
Example 3.1**.**
Let , where and . Each set and is a subspace of that is isometric to and therefore is convex. We can directly calculate that if , then . Also, if , then . Now let us consider . First off, the set is a midpoint of and and thus is in . Now notice that is a midpoint of and and thus is in . But , which is too large to be in , much less !
To fully refute Ollivier and Villani’s suggestion for approximating , we need to show that this example is typical–as they only suggest that this method will “probably” work. For this purpose, we show that every convex closure of a set of points in the hypercube is the embedding of a smaller dimensional hypercube.
Theorem 3.2**.**
If is a subset of the Boolean lattice such that , then there exists sets such that
[TABLE]
Proof.
First, it should be clear that has a unique maximal element, as otherwise a midpoint between the sets will have more elements. A symmetric argument gives a unique minimal element. Iterated applications of the midpoint argument gives every set in between the maximal and minimal element. ∎
We found the following consequences of this result to be interesting, as it illustrates how unusual the behavior can be for discrete spaces that appear nice and simple.
Corollary 3.3**.**
The convex closure of any non-trivial ball in a hypercube is the whole space.
Proof.
By symmetry, let us assume that the center of the ball is . Because the ball is non-trivial (in other words, the radius is positive), the ball contains the element for all as it is the minimum distance from . By Theorem 3.2, the convex closure of the ball contains and and everything in-between. ∎
Corollary 3.4**.**
If are nonempty sets of vertices in the hypercube and is the convex closure of , then .
Proof.
For each , there exists an automorphism of such that when and otherwise. Let be an arbitrary fixed element of , and pick some . Let , and let us consider the space after has been applied. By construction, . For each , there exists a such that . So by Theorem 3.2, the convex closure of contains . The corollary follows from symmetry. ∎
3.2 The , , and metric
Let us note that there are natural constructions of the , , and metrics in graph theory. Suppose is the product of spaces . The metric is denoted , as mentioned above. The metric is isometric to . Finally, the metric is denoted by the tensor product, which is .
Theorem 3.5**.**
Suppose such that . Let be such that and . Under these conditions, for large we have that
[TABLE]
Proof.
We use shorthand . We claim that for each that
[TABLE]
The claim will prove the theorem, as
[TABLE]
for by the definition of .
We define as before. We will define a map
[TABLE]
Moreover, we will demonstrate that this map satisfies for any with , the property
[TABLE]
which proves the claim.
For , we will use notation to refer to individual coordinates. For a pair , let be the coordinates where . Let and . We define
[TABLE]
It is clear that
- •
and are weighted midpoints between and ,
- •
that , and
- •
.
Thus, . Moreover, , and so .
All that remains is to prove that for any fixed , we have that is not too large. Let be defined in the same way as , but on the extended domain . By the definition of , it is a simple calculation to see that if , then . So for any fixed , we have that , and it equals if and only if . The claim will then follow when we demonstrate next that for
[TABLE]
values of , we have that .
Recall that we are assuming is fixed. We define sets
[TABLE]
and
[TABLE]
Suppose and . By definition of , it should be clear that . Therefore . The proof then follows from applying the second part of Theorem 2.7. ∎
The , , and metrics are special in that is not unique in standard space. The positive curvature in metrics can entirely be attributed to the fact that grows exponentially with . But what about and ? The problem with these metrics is that might be unique. Suppose our space is the -dimensional product of subspace , and let points in be denoted by tuples , . If is equipped with metric, then may be unique (depending on ) if for all . If is equipped with metric, then may be unique (depending on ) if and for all .
Essentially the issue with the and metrics is that there exists a convex embedding of in . This is similar to Ruzsa [34] and Gardner and Gronchi’s [21] problems for establishing a discrete analogue of the Brunn-Minkowski inequality: there is a degenerate case where the sets lie in a smaller dimension. This leads to several natural open questions.
Open Question 3.6**.**
Suppose is the -dimensional product of space equipped with the metric.
If we force the sets and to have dimension in a manner similar to Gardner and Gronchi’s work, how do the sets of midpoints grow with the distance between and ? (The same question can be asked in the metric). 2. 2.
If the distance between and is asymptotically bigger than the diameter of , do we see an exponential growth in the number of midpoints between sets similar to the growth when equipped with ?
There are several initial statements that can be said of the questions in 3.6. First, the -dimensional hypercube embeds into for any when equipped with the metric, and this establishes a best-case-possible because Ollivier and Villani’s result is known to be asymptotically tight. Second, if we are equipped with the metric, then a proof similar to that of Theorem 3.5 will show that the set of points within distance of the midpoints of and is at least , where is the diameter of . This is because has, with error term , a “rough geometry” equivalent to that of metric . However, if has strong negative curvature, then it is not clear that the midpoints themselves will be large. Finally, note that the second question is similar to the first—with the change that we are requiring the set to have large dimension instead of and themselves, which may be sufficient due to the non-uniqueness of midpoints. A similar question involving the metric will require the opposite assumption: that coexist in a small dimensional subspace.
3.3 Catalog of displacement convexity definitions
There are several different versions of displacement convexity. The general intuition is that when supply is optimally routed to demand across a time span , then the mid-transit goods in a positively curved space at time should be more “spread out” than the convex combination of of the spread of and of the spread of .
Formally, the functions are normalized to probability spaces . A function is a transportation of the goods from the supply to the demand if for all and for all . The Wasserstein cost of a transportation is , and the Wassertsein cost of order two is . The Wasserstein distance between and is . The generalization where are not normalized is also known as the Earth Mover’s Distance. An optimal transport minimizes . For finite spaces such a minimum clearly exists.
For each , we place a probability distribution on the geodesics from to . For a geodesic that starts at and ends at , let for be the point on whose distance from is . For a fixed optimal transport and probability distributions , we define probability distribution for to be
[TABLE]
Distance interpolation is an optimal transportation combined with a uniform distribution applied to each . For the discrete analogue, a point would be and .
For probability space , let denote its entropy. The formal notion of mid-transit goods being more spread out than the supply and demand is that
[TABLE]
where is a nonnegative number that represents the strength of the curvature of the space . The distinction between the different versions of displacement convexity come from the distinction between the words “any” and “every” when it comes to multiple optimal transportations or multiple geodesics between points (events that occur rarely in geometric geodesic spaces, but which played a large role in Section 3.2). We outline several definitions below.
- •
(Strong displacement convexity) For any optimal transportation and geodesic choices , (10) holds.
- •
(Sort-of-strong displacement convexity) (10) holds for distance interpolation on any optimal transportation.
- •
(Sort-of-weak displacement convexity) (10) holds for distance interpolation on some optimal transportation.
- •
(Weak displacement convexity) There exists optimal transportation and geodesic choices , possibly dependent on , such that (10) holds.
Strong displacement convexity is presented as the “normal” version in [32]. That the inequality should hold for any transportation or geodesic is also the condition in [20], although they are not working with displacement convexity. The conditions of sort-of-strong curvature are presented as the definition of distance interpolation in Theorem 7.21 of [37], conditions that imply uniqueness of geodesics and optimal transportations are used later (such as Definition 16.5, which is set up by chapters 9 and 10). Gozlan, Roberto, Samson, and Tetali [22] work with sort-of-weak displacement convexity, but they use transportations that minimize instead of and “midpoints” take mass Gaussian distributed across a geodesic rather than being a point mass. Weak displacement convexity is presented in chapter 29 of [37], and in [28], although it is just called “displacement convexity” except in the bottom of page 906. Weak displacement convexity is also used by Bonciocat and Sturm (see equation (2.1) of [9]) in their work on approximate midpoints.
If we do not specify a type of displacement convexity, then the statement holds for all four forms of displacement convexity. In the following sections we will prove statements about specific types of displacement convexity, but first we will prove several statements that hold for general displacement convexity. To do so, we modify our definition of midpoints. For vertices , let the midpoints between them be as follows:
(A) if is even, then , and
(B) if is odd, then .
We use several ideas from continuous transportation theory in our work. Some of those ideas translate into similar techniques in the discrete setting; while other ideas translate into performing the opposite technique. Specifically, we will use the relationship between cyclical monotonicity (Definition 5.1 of [37]) and optimal transportations as before. On the other hand, we reverse the Monge-Mather shortening principle (chapter 8 of [37]), because after Lemma 3.10 we will only be interested in transportations whose entire support is one class in the transitive closure of the relation when some -geodesic intersects some -geodesic.
A transportation is cyclically monotone if for every sequence of pairs we have that . It is easy to confirm that a transportation is optimal if and only if it is cyclically monotone, as this is the same technique as finding alternating cycles when looking for maximum weight matchings in a bipartite graph.
Recall that in the proof to Theorem 3.5 we split to classes . We will do this again using the following lemmas.
Lemma 3.7**.**
If is an optimal transportation for probability functions over graph , , and , then .
Proof.
By contradiction, assume that . The proof to the lemma is similar when is odd or even, so we will assume is odd and allow the reader to handle the even case. Let and . Let , and let , . Repeated use of the triangle inequality implies . We can directly calculate that , which contradicts that is cyclically monotone. ∎
We now give a formal definition for a partition of a transportation. Lemma 3.7 implies that each distinct set of distances involved in a transportation can be used to construct a partition.
Definition 3.8**.**
Let be a transportation from to . Let be nonempty sets such that and if , , then . Then form a partition of . Let , define probability functions and , and let us define transportation as if and otherwise. Define similarly.
We remark that , and so for all we have that . The following claim is then obvious, and we omit the proof.
Claim 3.9**.**
We use the notation of Definition 3.8. If is an optimal transportation, then so are and . If is an optimal transportation and , are optimal transportations, then is also an optimal transportation from to .
In the next lemma we show that it suffices to prove curvature exists among transportations without partitions in order to prove that curvature exists in general. We follow this with another lemma, which describes transportations without partitions.
Lemma 3.10**.**
Let be an optimal transportation from to with a partition as defined in Definition 3.8. Let be the probability measures for the midpoints using transportations , respectively. If for there exists fixed values and convex function such that
[TABLE]
then
[TABLE]
Proof.
By the definition of of partition, the support of and the support of is disjoint, and therefore . By the formula for entropy, this implies that . By the convexity of entropy, the equality for is an upper bound: and . So we are left with
[TABLE]
and thus the lemma follows from the convexity of and Claim 3.9. ∎
Lemma 3.11**.**
*Let be an optimal transportation from to that does not have a partition. Let be the support of and be the support of .
(1) There exists a constant such that if then .
(2) For the in part (1) and any and , we have .
(3) For the in part (1) and for all , .*
Proof.
(1)This is a restatement of Lemma 3.7.
(2) For a fixed transportation , we define a graph , with vertex set and edge set . Because has no partition, is a connected graph.
By way of contradiction, assume that for some fixed pair . Let be such that and and let be such that and . Because is connected, there exists a path . By (1), we have that for all . By the triangle inequality and the assumption , we have that . This is a contradiction, as is not cyclically monotone.
(3) By (2), we see that . By (1), we see that . ∎
Let us close this subsection by remarking that finding an optimal transportation for a given with a fixed metric is a problem in linear programming, where the support of each of and appear as a unique constraint (elements in the intersection of the supports of and thus show up as two constraints). The solution to the dual problem is , where is a function over the support of and is a function over the support of . The dual problem is named as the “dual Kantorovich problem” by Villani in chapter 5 of [37].
Remark 3.12**.**
A consequence of Lemma 3.11 applied to Remark 5.13 of [37] is that the optimal transportations between and will have no partitions if and only if the solutions to the dual problem only have constant valued functions .
3.4 Displacement convexity and the hypercube
Our first result is that the hypercube does not have positive or nonnegative sort-of-strong displacement convexity.
Example 3.13**.**
The -dimensional hypercube for contains vertex sets and such that when and are uniform probability measures with support on and there exists an optimal transport such that the entropy of the probability space
[TABLE]
is less than the average of the entropies of and .
Proof.
We return to the notation of the Boolean lattice introduced in Section 3.1. Let and for . Every vertex in is distance from any vertex in , so any transportation function is optimal. We choose the transportation function such that for each . So , , and is [math] otherwise. The entropy of and are each , and the entropy of is . The example thus holds when . ∎
Now we will present a result that is progress towards showing that the hypercube does have positive sort-of-weak displacement convexity. Let us assume for the rest of the section that all transportations do not have a partition as in Definition 3.8. Let be the support of and be the support of ; the lack of a partition implies that .
Recall the map defined in the proof to Theorem 3.5. We generalize it as follows: let using the notation of the Boolean lattice, and let . If is even, then and . If is odd, then elements of are edges , where . We then define
[TABLE]
We have one advantage on the hypercube when working with an optimal transportation rather than the transportation , which is used for Brunn-Minkowski curvature: in the following Lemma we show that is injective.
Lemma 3.14**.**
If and for optimal transport , then for any .
Proof.
First, let us assume that is even, and so is a collection of vertices. Recall that . Also, that if is fixed, then is injective. Therefore . Notice that . This contradicts Lemma 3.11.
Now suppose that is odd, so that and . We have that , and the same argument follows. ∎
As we saw in Example 3.13, the midpoints will not spread out for every transportation between sets. Each transportation function is a probability measure over the space , and we will work with the probability measure that maximizes (while still satisfying the conditions of being an optimal transportation).
Lemma 3.15**.**
Fix probability distributions , and let maximize the entropy among optimal transportations from to . Assume that has no partition. For a fixed element , let be the pairs of points such that and . Possibly the list has repeats, and so may . Then there exists a such that for all , and for all . Moreover, there exists a function such that for all .
Proof.
That there exists a such that for all is a restatement of Lemma 3.7. By the triangle inequality, we have that for all . By Lemma 3.11 . Moreover, the sharpness of the triangle inequality implies that for all .
Let , , and . Let and otherwise. By construction, is a transportation from to , and , so is also optimal. We see that can be thought of as the product of distributions , which is known to maximize entropy over product spaces by the independence inequality. Therefore , with equality only when . By construction, satisfies the conclusion of the lemma (), and so the proof concludes by the condition that had maximum entropy among optimal transportations. ∎
Before we prove our theorem about the entropy of midpoints, let us recall a few facts about entropy. Let be a probability measure over the product space . The entropy of is defined as . We are interested in how the entropy acts when we restrict some coordinates of . For , let . For , , and , we use the notation to denote the situation where for all . Let be the projection of into , which is equivalently the probability distribution over such that (because ). For , the conditional entropy is
[TABLE]
[TABLE]
From the definitions, when we have , and so the conditional entropy is always nonnegative. The name comes from the fact that if is fixed and is a random variable conditioned on , then . From the formula, we see that if , then .
We define a probability distribution over , where if , and otherwise. We will study how the entropy of changes as we project onto specific coordinates. This will be clearer if we use the following abuse of notation: let represent the first coordinate, represent the second coordinate, represent the third coordinate, represent the fourth coordinate, and represent the fifth coordinate. So is projected onto the first, third, and fourth coordinates. By construction, , , , , and is a uniform distribution. By symmetry, for any we have that is isomorphic to . Because is injective, each of is isomorphic to .
We will use the following technical statement.
Claim 3.16**.**
Let be defined as above for optimal transportation with maximum entropy that has no partition. We have
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Proof.
By the injectivity of , the sets for a fixed is in bijection with the sets such that and . By Lemma 3.15, we have and
[TABLE]
By the definition of conditional entropy and Lemma 3.15, we have that
[TABLE]
and
[TABLE]
So
[TABLE]
The second equality follows from a symmetric argument. The third equality follows a similar argument, with
[TABLE]
and
[TABLE]
For the final inequality, we first remark that for any fixed . So
[TABLE]
The result for follows symmetrically. ∎
Now we are prepared to prove the theorem.
Theorem 3.17**.**
Let be probability distributions over the discrete hypercube. Let be an optimal transportation from to that maximizes entropy, and let be the probability distribution over as calculated by with distance interpolation. If for some constant whenever , then
[TABLE]
Proof.
Summing the first two equalities from Claim 3.16 and simplifying by the definition of conditional entropy, we have that
[TABLE]
The last inequality of Claim 3.16 implies that , so
[TABLE]
Now so
[TABLE]
When we combine (11) and (12), we see that
[TABLE]
The right hand side is minimized when , and the resulting value is the statement of the theorem. ∎
Theorem 3.18**.**
Let be probability distributions over the -dimensional discrete hypercube. There exists a transportation that when combined with distance interpolation produces a probability distribution over the midpoints such that
[TABLE]
Proof.
It is easy to calculate that and that . We used computer software to calculate and to confirm that . So Lemma 3.10 applied to Theorem 3.17 provides that
[TABLE]
For any integers , , space and probability measures we have that and . ∎
Let us finish this section by returning to the Brunn-Minkowski curvature of the hypercube.
Proposition 3.19**.**
There exists and such that the -dimensional hypercube for has Brunn-Minkowski curvature at least .
Proof.
First, let us consider the situation where for some . Note that for any , we have that
[TABLE]
On the other hand, because , we have that . So if , then , and for sufficiently large and sufficiently small we get that
[TABLE]
So now assume that . The version of the claim in the proof to Theorem 3.5 that appears in [32] is that . This inequality is weakest when is largest, so an improvement on this inequality for for some fixed is sufficient for an improvement on the final result. Fix a , and define a function such that the first coordinate of is and the second coordinate is a vector over such that the set of coordinates with nonzero entries is the symmetric difference between as elements of the Boolean lattice. By the binary nature of the discrete hypercube, is invertible. But when we have that and for any we have . So the stronger bound comes from
[TABLE]
By setting as a function of , we have that . ∎
3.5 Strong displacement convexity
Theorem 3.20**.**
If is a graph with nonnegative strong displacement convexity, then is a path, a cycle, a complete graph, or a complete graph minus an edge, and has strong displacement convexity [math].
First we will discuss the curvature of paths and cycles, and then we will show that no other graph can have non-negative curvature.
Lemma 3.21**.**
The path and the cycle have strong displacement convexity [math].
Proof.
We assume that our optimal transportation has no partition, and then use Lemma 3.10 to handle the other case. But Lemma 3.11 can only be satisfied on the path or cycle for point masses for . But then . ∎
Lemma 3.22**.**
If is a graph with nonnegative strong displacement convexity, then for any and such that we have that or .
Proof.
By way of contradiction, suppose that , , and . Then let be the uniform distribution over , and let be the uniform distribution over (which may be one or two points). There exists an optimal transportation from to such that the only midpoint is . ∎
Proof of Theorem 3.20. If is connected and every vertex has degree or , then is a path or a cycle. So suppose is incident with at least edges. If , then by Lemma 3.22, is either the complete graph or the complete graph minus an edge. So assume there exists a such that ; equivalently and .
If , then because and by Lemma 3.22 there exists an such that and . But this contradicts Lemma 3.22, because is missing edges and . So assume that , and by symmetry this implies .
Now we claim that . If this is not true, then there exists a and a such that . But this contradicts Lemma 3.22, because is missing edges and .
So and . If is not the complete graph minus an edge, then there exists a . By Lemma 3.22, is all pairs of points except and . Because , there exists a . But this contradicts Lemma 3.22, because is missing edges and .
3.6 Other graphs
A typical graph will not have curvature for the same reasons that the hypercube does not: by examining the neighborhood of a single vertex. The in Theorem 3.18 is necessary for transportations with due to the fact that . However, we may be able to prove a “rough geometry” version of curvature. That is, the curvature equation may exist with a small error term that accounts for transportations with small Wasserstein distances. We establish a first result towards this goal by showing that expander graphs have some flavor of positive curvature.
Theorem 3.23**.**
Let be a graph such that every vertex has degree and is the second largest eigenvalue of the normalized adjacency matrix. Let be vertex sets such that . If , then .
Proof.
We define . For disjoint vertex sets , denote the number of edges with an endpoint in each of and . We will make use of two theorems from spectral graph theory. The first is Corollary 5.5 of [16], which states that . The second is the Expander Mixing Lemma (see Theorem 5.1 of [16]), which states that
[TABLE]
Each vertex is adjacent to edges, and therefore . So if , then (note that ). Let and , and thus
[TABLE]
[TABLE]
Next, each vertex in with a neighbor in is a midpoint in . Each vertex is adjacent to edges, and so . Therefore . The theorem then follows from the bound on . ∎
We are interested in determining whether the assumption on a fixed degree is necessary. Using the normalized Laplacian, there exist generalizations of expander graphs for arbitrary degree distributions [17]. However, it may be that even rough positive curvature will only exist when the degree distribution falls into a tight range. A power-law graph is a graph whose degree distribution approximately follows the distribution of the inverse of a polynomial (and hence are widely skewed). Such graphs have become popular recently for their ability to model social, technical, and biological networks. Models for such graphs include Kronecker graphs (used for Graph500).
Conjecture 3.24**.**
It is impossible for a power-law graph to have “big picture” curvature.
The conjecture would imply that positively curved networks will not be useful for social networks, as negative curvature has been. On the other hand, it may still be useful to study engineered networks (for example, super computing clusters frequently use product topologies, such as the discrete torus). Chung, Lu, and Vu [18] studied the spectral properties of a random power-law graph. We provide some evidence for our conjecture by studying random walks on arbitrary power-law graphs.
Theorem 3.25**.**
Suppose we weight a path as . If we pick a random shortest path , where is picked proportional to across all pairs , then the probability that is the midpoint of is proportional to .
Proof.
We consider a random process that is a random walk with edge teleportation. Fix some arbitrary . We place some token on a vertex of the graph and will move that token according to a random walk with probability and according to edge teleportation with probability .
We start the process with a stationary distribution on the vertices for a random walk; let the probability that our token is on be proportional to . Suppose the token is on vertex , and let us discuss how the token will move at the next step in the process. With probability we move the token to a random neighbor of . Note that this step preserves the probability distribution; after a random walk step the probability that the token is on is still proportional to . With probability we choose an edge uniformly at random and teleport the token to one of its endpoints chosen uniformly at random. This too preserves the probability distribution; and so the probability that the token is on will be proportional to as we iterate this process.
Let be an infinite sequence of vertices such that the token is on vertex after iterations of our process. Let be the iterations such that the transition involves edge teleportation for each and involves a step in a random walk when . We choose a random geodesic as follows: repeatedly pick some until the path is a geodesic.
The theorem will follow when we establish two facts: (1) the probability that the midpoint of is is proportional to and (2) the probability that a fixed geodesic is chosen is proportional to . Part (1) is easy, as the probability of any vertex being is proportional to . All that remains is (2).
Let be some fixed geodesic, and let us calculate the probability that for the first value of (in other words, without accounting for the fact that we throw out non-geodesic paths). The probability that is . The probability that is . For shorthand, let us denote and . Given that and , the probability that is . Thus
[TABLE]
Now let us adjust the calculation of the probability of to account for the fact that we will throw out non-geodesic paths. The probability is clearly proportional to , when taken in comparison to all other paths. When we consider the proportional values, the term cancels as it is a uniform constant. Therefore the probability that for the final value of is proportional to . The theorem follows by considering the limit . ∎
Now suppose we perform the same proof, but this time the probability of an edge teleportation from to is rather than . The following theorem is the result of this modification, plus averaging the probability of path with its reverse.
Theorem 3.26**.**
Suppose we weight a path as . If we pick a random shortest path , where is picked proportional to across all pairs , then the probability that is the midpoint of is proportional to .
Appendix A Statements that were not used
There does not seem to be a significant difference between weak displacement convexity and Brunn-Minkowski curvature. We make this statement based on the fact that Lemmas 3.7 and 3.15 hold for any graph. In our work towards Theorem 3.17, we proved several lemmas that were not part of the final version of the proof. However, the statements are interesting in their own right, and may be useful towards establishing a bound on curvature for other spaces.
The following claim is from when we were still working with instead of .
Claim A.1**.**
For probability distributions , let be an optimal transportation. Suppose and . Under these circumstances, .
Proof.
The proof by contradiction of Lemma 3.7 will hold when . ∎
The goal of the following lemma was to prove an analogue of Hall’s Marriage theorem from the set of such that to the set of midpoints. The outline was to prove that any collection had large sets and , and therefore many midpoints by the Brunn-Minkowski inequality.
Claim A.2**.**
For any graph and probability measures , there exists an optimal transportation from to such that for all , we have that
[TABLE]
Proof.
For a fixed transportation , we define a bipartite graph with disjoint vertex sets , where and . If and for some vertex , then and both exist and are different. We define the edges to be . Among all optimal transportations, let be the one that minimizes .
We claim that is a forest. By way of contradiction, suppose that a cycle is in . Let and let , where the indices are taken modulo . We consider two transportations , that equal on all pairs over vertices except
- •
,
- •
,
- •
, and
- •
.
By cyclic monotonicity, we have that are each optimal. This is a contradiction, because by construction, .
Now consider vertex sets and edge set . Each vertex in induces at most two vertices in ; let us call this induced subgraph . Each edge in is in . Because is a forest, there are at most such edges. ∎
The problem with Claim A.2 is that the Brunn-Minkowski inequality would collect many midpoints between where . Moreover, this seems like a fundamental flaw, given Example 3.13. If anything, this is the complete opposite of the transportation that we did use. By a proof similar to Theorem 3.11(2), it can be shown that if is optimal, maximizes entropy, and has no partition, then for all pairs with , , . Regardless, this claim is an interesting statement in its own right.
The following statement turned out to be unnecessary for proving Theorem 3.20. But it is true for general finite discrete spaces, and so it may be applicable towards a more general statement.
Claim A.3**.**
If has strong displacement convexity for probability measures that are uniform over vertex sets , respectively, then has strong displacement convexity.
Proof.
Let be arbitrary probability distributions with optimal transport with geodesic choices which produces midpoint distribution . Suppose there exists some vertex such that for , , and (we allow the possibility of ). Let , and define , , and otherwise. By construction, there exists a transportation from to and geodesic choices such that the midpoints have distribution . We propose that this is an optimal transposition from to .
Before proving the proposition, let us show how the proposition implies the claim. By convexity we know that , so if to has curvature, then so does to . Repeat this process until whenever the geodesics of and involving and share a midpoint. We define equivalency class for , and let and be the probability distribution projected onto the set and scaled by . If each transportation using and has curvature, then the whole will have curvature as the midpoints do not overlap. But is a uniform measure over , which is the conclusion of the claim. Symmetrically we do this to as well, and so the claim holds.
So now we prove the proposition. By Lemma 3.7 , and so . Suppose there exists a transportation from to such that . Among all such transportations, let minimize . If this number is [math], then , which is a contradiction.
We consider the bipartite graph and as constructed in Claim A.2. There must exist some where . But both and move mass into the same distribution , so there must exist a such that . We continue this process until we have found a cycle such that for each , and . We use this cycle to update to be or to be as in Claim A.2, and one of two things happen: (1) decreases and , which is a contradiction, or (2) . But notice that this update can also be done to into transportation with , which contradicts the minimality of . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Albert, B. Das Gupta, and N. Mobasheri, “Topological Implications of Negative Curvature for Biological and Social Networks.” Physical Review E 89 (2014) 032811.
- 2[2] N. Alon, B. Boppana, and J. Spencer, “An Asymptotic Isoperimetric Inequality.” Geometric and Functional Analysis 8 (1996) 411–436.
- 3[3] N. Alon and V. Milman, “ λ 1 subscript 𝜆 1 \lambda_{1} , Isoperimetric Inequalities for Graphs, and Supercontractors.” JCTB 38 (1985) 73–88.
- 4[4] J. Alonso, T. Brady, D. Cooper, V. Ferlini, .M. Lustig, M. Mihalik, M. Shapiro, and H. Short, “Notes on Word Hyperbolic Groups.” Group theory from a geometrical viewpoint, Proceedings of the ICTP Trieste 1990 , World Scientific (1991) 543 – 617.
- 5[5] F. Bauer, F. Chung, Y. Lin, and Y. Liu, “Curvature aspects of graphs.” Proc. of the AMS 145 (5) (2017) 2033-2042.
- 6[6] S. Bobkov, C. Houdré, and P. Tetali, “The subgaussian constant and concentration inequalities.” Israel Journal of Mathematics 156 (2006) 255–283.
- 7[7] B. Bollobás and I. Leader, “Compressions and isoperimetric inequalities.” JCTA 56 (1991) 47–62.
- 8[8] B. Bollobás and I. Leader, “An isoperimetric inequality on the discrete torus,” SIAM J. Discrete Math. 3 (1990) 32–37.
