Probabilistic and Geometrical Applications to Graph Theory

Matthew Yancey

arXiv:1705.09725·math.CO·May 30, 2017

Probabilistic and Geometrical Applications to Graph Theory

Matthew Yancey

PDF

Open Access

TL;DR

This paper explores the relationship between graph isoperimetric properties and Lipschitz functions, resolving conjectures, and extends discrete geometric inequalities by analyzing extremal functions and measure concentration in hypercubes.

Contribution

It characterizes extremal Lipschitz functions related to isoperimetric inequalities, resolves key conjectures, and advances the understanding of discrete curvature and midpoint bounds in hypercubes.

Findings

01

Resolved conjecture on extremal functions of the subgaussian inequality for odd cycles.

02

Linked maximum variance functions to the isoperimetric function of product graphs.

03

Disproved a proposed method for bounding t-midpoints in discrete hypercubes.

Abstract

This paper consists of two halves. In the first half of the paper, we consider real-valued functions $f$ whose domain is the vertex set of a graph $G$ and that are Lipschitz with respect to the graph distance. By placing a uniform distribution on the vertex set, we treat $f$ as a random variable. We investigate the link between the isoperimetric function of $G$ and the functions $f$ that have maximum variance or meet the bound established by the subgaussian inequality. We present several results describing the extremal functions, and use those results to resolve: (A) a conjecture by Bobkov, Houdr\'e, and Tetali characterizing the extremal functions of the subgaussian inequality of the odd cycle, and (B) a conjecture by Alon, Boppana, and Spencer on the relationship between maximum variance functions and the isoperimetric function of product graphs. While establishing a discrete…

Equations170

E e^{t (X - E X)} \leq e^{σ^{2} t^{2} /2} .

E e^{t (X - E X)} \leq e^{σ^{2} t^{2} /2} .

P (X - E (X_{*}) \geq h \geq 0) < e^{(R + 3) l n (c) - l n (k) - \frac{h ^{2}}{2 ( n - 1 )}} .

P (X - E (X_{*}) \geq h \geq 0) < e^{(R + 3) l n (c) - l n (k) - \frac{h ^{2}}{2 ( n - 1 )}} .

m_{ρ} (a, b) = {u : d (a, u) = ⌊ ρ d (a, b)⌋, d (a, u) + d (u, b) = d (a, b)} \cup

m_{ρ} (a, b) = {u : d (a, u) = ⌊ ρ d (a, b)⌋, d (a, u) + d (u, b) = d (a, b)} \cup

\cup {u : d (a, u) = ⌈(1 - ρ) d (a, b)⌉, d (a, u) + d (u, b) = d (a, b)},

\cup {u : d (a, u) = ⌈(1 - ρ) d (a, b)⌉, d (a, u) + d (u, b) = d (a, b)},

∣ m (S, T) ∣ \geq ∣ S ∣∣ T ∣ e^{\frac{K}{8} d_{*} (S, T)^{2}} .

∣ m (S, T) ∣ \geq ∣ S ∣∣ T ∣ e^{\frac{K}{8} d_{*} (S, T)^{2}} .

S (μ_{C}) \geq \frac{1}{3} S (μ_{A}) + \frac{1}{3} S (μ_{B}) + \frac{2}{5 d ^{3}} (W^{2} (μ_{A}, μ_{B}))^{2} - \frac{2}{3} .

S (μ_{C}) \geq \frac{1}{3} S (μ_{A}) + \frac{1}{3} S (μ_{B}) + \frac{2}{5 d ^{3}} (W^{2} (μ_{A}, μ_{B}))^{2} - \frac{2}{3} .

∣ m_{ρ} (S, T) ∣ > ∣ S ∣∣ T ∣ e^{C d (1/2 + o (1))} .

∣ m_{ρ} (S, T) ∣ > ∣ S ∣∣ T ∣ e^{C d (1/2 + o (1))} .

P (X - E X \geq h) \leq e^{- h^{2} / (2 σ^{2})}, h \geq 0.

P (X - E X \geq h) \leq e^{- h^{2} / (2 σ^{2})}, h \geq 0.

∣ E (X) - m (X) ∣ \leq E ∣ X - m (X) ∣ \leq E ∣ X - E (X) ∣ \leq Var (X) .

∣ E (X) - m (X) ∣ \leq E ∣ X - m (X) ∣ \leq E ∣ X - E (X) ∣ \leq Var (X) .

P (X - m (X) \geq h^{'}) \leq e^{- (h^{'} - Var (X))^{2} / (2 σ^{2})}, h^{'} \geq Var (X) .

P (X - m (X) \geq h^{'}) \leq e^{- (h^{'} - Var (X))^{2} / (2 σ^{2})}, h^{'} \geq Var (X) .

P (X - m (X) \geq h^{'}) \leq e^{- (h^{'} / σ - 1)^{2} /2}, h^{'} \geq σ .

P (X - m (X) \geq h^{'}) \leq e^{- (h^{'} / σ - 1)^{2} /2}, h^{'} \geq σ .

E (2 (e^{t (X - E X)} - 1) t^{- 2}) = E (2 t^{- 1} (X - E (x)) + (X - E (x))^{2} + O (t)) .

E (2 (e^{t (X - E X)} - 1) t^{- 2}) = E (2 t^{- 1} (X - E (x)) + (X - E (x))^{2} + O (t)) .

L_{G □ H} (t) \geq L_{ℓ} (t) = ln (E_{u, v} (e^{t (f (u) + f^{'} (v))})) = ln (E_{u} (e^{t f (u)}) E_{v} (e^{t f^{'} (v)})) = L_{f} (t) + L_{f^{'}} (t) .

L_{G □ H} (t) \geq L_{ℓ} (t) = ln (E_{u, v} (e^{t (f (u) + f^{'} (v))})) = ln (E_{u} (e^{t f (u)}) E_{v} (e^{t f^{'} (v)})) = L_{f} (t) + L_{f^{'}} (t) .

e^{L_{G □ H} (t)}

e^{L_{G □ H} (t)}

e^{L_{f} (t)}

e^{L_{f} (t)}

σ_{J_{n}}^{2} = (n - 1 - 3 \leq 2 r + 1 \leq n \sum 1 - \frac{2}{( 2 r + 1 ) lo g ( \frac{r + 1}{r} )}) /4 < \frac{n - 1}{4} .

σ_{J_{n}}^{2} = (n - 1 - 3 \leq 2 r + 1 \leq n \sum 1 - \frac{2}{( 2 r + 1 ) lo g ( \frac{r + 1}{r} )}) /4 < \frac{n - 1}{4} .

Var (X)

Var (X)

P (X - E X_{*} \geq h \geq 0) < e^{(R + 3) l n (c) - l n (k) - \frac{h ^{2}}{2 ( n - 1 )}} .

P (X - E X_{*} \geq h \geq 0) < e^{(R + 3) l n (c) - l n (k) - \frac{h ^{2}}{2 ( n - 1 )}} .

P (X - E X_{*} \geq h \geq 0) < e^{(R + 3) l n (2) - l n (k) - \frac{h ^{2}}{2 ( n - 1 )}} .

P (X - E X_{*} \geq h \geq 0) < e^{(R + 3) l n (2) - l n (k) - \frac{h ^{2}}{2 ( n - 1 )}} .

P (X - E X_{*} \geq h \geq 0) ∣ C_{r_{1}, \dots, r_{k}} ∣ \leq P (X_{*} - E X_{*} \geq h \geq 0) ∣ {0, 1}^{n} ∣.

P (X - E X_{*} \geq h \geq 0) ∣ C_{r_{1}, \dots, r_{k}} ∣ \leq P (X_{*} - E X_{*} \geq h \geq 0) ∣ {0, 1}^{n} ∣.

P (X_{*} - E X_{*} \geq h \geq 0) < e^{- \frac{h ^{2}}{2 ( n - 1 )}} .

P (X_{*} - E X_{*} \geq h \geq 0) < e^{- \frac{h ^{2}}{2 ( n - 1 )}} .

∣ C ∣ < 2∣ C_{n /2 - R} ∣ i = 0 \sum R c^{i} < ∣ C_{n /2 - R} ∣ c^{R + 2} .

∣ C ∣ < 2∣ C_{n /2 - R} ∣ i = 0 \sum R c^{i} < ∣ C_{n /2 - R} ∣ c^{R + 2} .

∣ A ∣ \leq ∣ C_{(k - r) /2, (k + r) /2} ∣ e^{\frac{- t ^{2}}{8 k - 8 + 2 r ^{2}}} .

∣ A ∣ \leq ∣ C_{(k - r) /2, (k + r) /2} ∣ e^{\frac{- t ^{2}}{8 k - 8 + 2 r ^{2}}} .

∣ A ∣ \leq ∣ C_{(k - r) /2, (k + r) /2} ∣ e^{- C k (1 + o (1))} .

∣ A ∣ \leq ∣ C_{(k - r) /2, (k + r) /2} ∣ e^{- C k (1 + o (1))} .

X \in Ω^{*}

X \in Ω^{*}

X \mbox i so pt ima l i f an d o n l y i f X^{'} \mbox i so pt ima l .

X \mbox i so pt ima l i f an d o n l y i f X^{'} \mbox i so pt ima l .

Ω^{\circ} = Ω^{*} - {λ Z \circ π_{Z} + (1 - λ) Y \circ π_{Y} : λ \in (0, 1), Z, Y \in Ω, Z \neq = Y, π_{Z}, π_{Y} \in S_{V (G)}} .

Ω^{\circ} = Ω^{*} - {λ Z \circ π_{Z} + (1 - λ) Y \circ π_{Y} : λ \in (0, 1), Z, Y \in Ω, Z \neq = Y, π_{Z}, π_{Y} \in S_{V (G)}} .

∣ {j : Y_{k} (u_{j}) = ℓ} ∣ \leq 2.

∣ {j : Y_{k} (u_{j}) = ℓ} ∣ \leq 2.

X

X

e^{L_{X_{ϵ}} (t)} = e^{- pt ϵ} e^{L_{X} (t)} + (e^{t ϵ (1 - p)} - e^{- pt ϵ}) p e^{t (X (u) - E (x))} .

e^{L_{X_{ϵ}} (t)} = e^{- pt ϵ} e^{L_{X} (t)} + (e^{t ϵ (1 - p)} - e^{- pt ϵ}) p e^{t (X (u) - E (x))} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPoint processes and geometric inequalities · Geometric Analysis and Curvature Flows · Topological and Geometric Data Analysis

Full text

Probabilistic and Geometrical Applications to Graph Theory

Matthew P. Yancey Institute for Defense Analyses / Center for Computing Sciences (IDA / CCS), [email protected]

Abstract

This paper consists of two halves.

In the first half of the paper, we consider real-valued functions $f$ whose domain is the vertex set of a graph $G$ and that are Lipschitz with respect to the graph distance. By placing a uniform distribution on the vertex set, we treat $f$ as a random variable. We investigate the link between the isoperimetric function of $G$ and the functions $f$ that have maximum variance or meet the bound established by the subgaussian inequality. We present several results describing the extremal functions, and use those results to resolve: (A) a conjecture by Bobkov, Houdré, and Tetali characterizing the extremal functions of the subgaussian inequality of the odd cycle, and (B) a conjecture by Alon, Boppana, and Spencer on the relationship between maximum variance functions and the isoperimetric function of product graphs.

While establishing a discrete analogue of the curved Brunn-Minkowski inequality for the discrete hypercube, Ollivier and Villani suggested several avenues for research. We resolve them in second half of the paper as follows.

•

They propose that a bound on $t$ -midpoints can be obtained by repeated application of the bound on midpoints, if the original sets are convex. We construct a specific example where this reasoning fails, and then prove our construction is general by characterizing the convex sets in the discrete hypercube.

•

A second proposed technique to bound $t$ -midpoints involves new results in concentration of measure. We follow through on this proposal, with heavy use on results from the first half of the paper.

•

We show that the curvature of the discrete hypercube is not positive or zero.

1 Motivation

This manuscript deals with graphs, and the attempts to apply alternative areas of mathematics to them.

The first half is motivated by concentration of measure. There is a canonical method to construct a Martingle by iteratively selecting a random variable $X$ that is defined as a function that is Lipschitz over a graph. The isoperimetric function of the graph has been linked to the extremal variables. We will present results describing the extremal variables, which will allow us to refine our knowledge about this link.

The second half is motivated by geometry. We recently investigated the relationship between negative curvature and congestion in transportation networks [38]. We currently are interested in showing that networks exhibiting qualities associated with positively curved spaces will consequently have many routing options for transportation between locations. In this paper, we use an abundance of midpoints as a proxy for an abundance of routing options. Our previous work had the advantage of a discrete analogue of negative curvature (Gromov’s $4$ -points hyperbolicity) that is well-studied [11, 4, 23], practical [19, 12, 13, 14], and consequential [36, 27, 26, 1]. Multiple discrete analogues of positive curvature have been proposed [20, 30, 31, 10, 9, 22], and some consequences of those notions are known [33, 15, 20, 5]. We will present results about the discrete analogue of the curved Brunn-Minkowski inequality.

1.1 Background: concentration of measure

For graphs $G_{1},\ldots,G_{k}$ , the Cartesian product $G_{1}\square\cdots\square G_{k}$ is the graph with vertex set $V_{1}\times\cdots\times V_{k}$ such that the distance between vertices $v=(v_{1},\ldots,v_{k})$ and $u=(u_{1},\ldots,u_{k})$ is $\sum_{i}d_{G_{i}}(v_{i},u_{i})$ . The edges of $G_{1}\square\cdots\square G_{k}$ are the vertices that are distance $1$ apart. We denote the Cartesian product with $G=G_{1}=\cdots=G_{k}$ as $G^{k}$ .

For a fixed graph $G$ and vertex set $S$ , let $B_{d}(S)=\{u:d_{G}(u,S)\leq d\}$ . The isoperimetric function is $i_{G,d}=\min_{|S|\geq|V(G)|/2}|B_{d}(S)|$ . A problem considered by several authors [2, 6, 35] is the isoperimetric function of product spaces. That is, we generalize the isoperimetric function from $G$ to $G^{n}$ as $i_{G,d,n}=\min_{|S|\geq|V(G^{n})|/2}|B_{d}(S)|$ .

One method to analyze the isoperimetric function of product spaces is to study probability spaces defined as uniform probabilities over the vertex set of a graph $G$ equipped with functions $X:V(G)\rightarrow\mathbb{R}$ that are Lipschitz with respect to the standard graph distance in $G$ . By placing a uniform distribution on $V(G)$ , we abuse notation and treat $X$ as a random variable. We use the notation from [6] that $c(G)=\max_{X}\sqrt{\operatorname{Var}(X)}$ , where $X$ is taken over Lipschitz functions on $V(G)$ . This notation is not consistent with [2, 35]. We will call a function $X$ variance-optimal if $X$ is Lipschitz and $\operatorname{Var}(X)=c^{2}(G)$ . Alon, Boppana, and Spencer [2] proved that $V(G^{n})-i_{G,d,n}$ decays exponentially as $d$ grows when $\sqrt{n}\ll d\ll n$ with a rate that relies on $c^{2}(G)$ . Let $m(X)$ be the median value of $X$ , and note that if $X$ is variance-optimal, then so is $-X$ and $X+a$ for all $a\in\mathbb{R}$ .

Theorem 1.1 ([2]).

Let $\sqrt{n}\ll d\ll n$ . We have that $i_{G,d,n}/|V(G^{n})|>1-e^{-\frac{d^{2}}{2c^{2}(G)n}(1+o(1))}$ . Let $S_{r,X}=\{a=(a_{1},a_{2},\ldots,a_{n})\in V(G^{n}):X^{n}(a)=\sum_{i=1}^{n}X(a_{i})\leq r\}$ . If $X$ is variance-optimal and $m(X^{n})\leq r\leq\mathbb{E}(X)=0$ , then $|S_{r,X}|\geq\frac{1}{2}|V(G^{n})|$ and $|B_{d}(S_{r,X})|/|V(G^{n})|\leq 1-e^{-\frac{d^{2}}{2c^{2}(G)n}(1+o(1))}$ .

They also conjectured a stronger relationship between $i_{G,d,n}$ and $c^{2}(G)$ —that the extremal set is determined by variance-optimal functions.

Conjecture 1.2 ([2], page 416).

Let $X$ be a variance-optimal function over $G$ . Is it true for $n$ sufficiently large and $d,r$ in appropriate ranges that $|B_{d}(S_{r})|\leq|B_{d}(S^{\prime})|$ for all $S^{\prime}\subset V(G^{n})$ with $|S_{r}|\leq|S^{\prime}|$ ?

Our initial intuition was that the conjecture may be true. A variable that maximizes variance will attempt to evenly spread values towards $\infty$ and $-\infty$ as possible. Because $X$ is Lipschitz we have that $B_{d}(S_{r})\subseteq S_{r+d}$ . The hope is that $V(G^{n})-S_{r+d,X}=S_{-(r+d),-X}$ is relatively large.

If true, the conjecture would have significance. Let $X_{1},X_{2}$ be variance-optimal over $G_{1},G_{2}$ respectively. Let $H=G_{1}\square G_{2}$ , and define variable $Y$ over $H$ as $Y(u_{1},u_{2})=X_{1}(u_{1})+X_{2}(u_{2})$ . Under these conditions, $Y$ is variance-optimal and $c^{2}(H)=c^{2}(G_{1})+c^{2}(G_{2})$ . If true, the conjecture would thus imply that the isoperimetric function of $G^{n}$ and the associated extremal vertex sets can be determined by the isoperimetric function of $G$ and the associated extremal vertex sets.

The conjecture has been proven true for the discrete hypercube by Harper [25], the Euclidean lattice by Bollobás and Leader [7], and the discrete torus by Bollobás and Leader [8].

Our initial set of results are a series of statements that describe the variance-optimal functions of a graph. In Section 2.5 we characterize the set of variance-optimal functions for three families of graphs. Each of those families consists of trees with long paths, and our most useful statement involves hairs. A hair of $G$ is a sequence of vertices $w_{0},w_{1},\ldots,w_{k}$ such that $N(w_{i})=\{w_{i-1},w_{i+1}\}$ for $1\leq i\leq k-1$ and $N(w_{k})=\{w_{k-1}\}$ .

Lemma 2.10, (9), Remark 2.11, and Remark 2.12 * Let $X$ be a variance-optimal function over $G$ with hair $w_{0},w_{1},\ldots,w_{k}$ . The sequence of values $X(w_{0}),X(w_{1}),\ldots,X(w_{k})$ is unimodular. Let $m=\min_{i}X(w_{i})$ and $M=\max_{i}X(w_{i})$ . If $G$ has vertices $u^{\prime},u^{\prime\prime}$ such that $X(u^{\prime})\leq m+1$ , $X(u^{\prime\prime})\geq M-1$ , and $u^{\prime},u^{\prime\prime}\notin\{w_{1},\ldots,w_{k}\}$ , then the sequence $X(w_{0}),X(w_{1}),\ldots,X(w_{k})$ is monotone. *

We compare this to the structure result of Alon, Boppana, and Spencer.

Theorem 1.3 ([2]).

Let $X$ be a variance-optimal function over $G$ , and define $\mathbb{O}(X)=\{v\in G:-0.5<X(v)-\mathbb{E}(x)\leq 0.5\}$ . Let $\nu_{X}=X(u)-\mathbb{E}(X)$ for some $u\in\mathbb{O}(X)$ , and let $C_{1},\ldots,C_{k}$ be the connected components of $G-\mathbb{O}(X)$ . Under these conditions, there exists variables $(\delta_{1},\ldots,\delta_{k})\in\{-1,1\}^{k}$ such that for $u\in C_{i}$ we have that $X(u)-\nu_{X}=\delta_{k}d(u,\mathbb{O}(X))$ .

The first two examples in Section 2.5 are an exploration of the assumption $m(x)\leq\mathbb{E}(X)$ in Theorem 1.1 that is absent from Conjecture 1.2. We determine a bound on the isoperimetric number of the third example, which leads to the following result.

Theorem 2.19 * Conjecture 1.2 is not true. *

We conjecture a simple characterization of variance-optimal functions over trees; we are proposing that $\mathbb{O}(X)$ is not the correct center of the tree conceptually. If true, the statement would strengthen Lemma 2.10 significantly.

Conjecture 1.4.

Let $X$ be a variance-optimal function over a tree $T$ . There exists a vertex $r$ such that for all vertices $u\in V(T)$ , we have that $|X(u)-X(r)|=d(u,r)$ . Moreover, if $T$ is not a path, then we may choose $r$ such that $d(r)\geq 3$ .

The range $\sqrt{n}\ll d\ll n$ is necessary in Theorem 1.1, as Alon, Boppana, and Spencer work with a different tool from probability when $d\approx\epsilon n$ for some constant $\epsilon$ . Specifically, they work with the subgaussian inequality, which states that

[TABLE]

The subgaussian constant for $X$ is a value for $\sigma_{X}^{2}$ such that (1) is true for all real $t$ . For vertex set $S$ , we define Lipschitz function $X_{S}(u)=d_{G}(u,S)$ . In particular, they showed that $|B_{d}(S)|/|V(G)|\geq 1-e^{-(d/\sigma-1)^{2}/2}$ when $d\geq\sigma$ .

For graph $G$ , let $\sigma_{G}$ be the supremum of $\sigma_{X}$ for functions $X$ that are Lipschitz over $G$ . We call a function $X$ over graph $G$ optimal if $\sigma_{X}=\sigma_{G}$ . When it is clear, we drop the subscript from $\sigma$ .

Using the same construction as above, Alon, Boppana, and Spencer [2] showed that $\sigma^{2}_{G_{1}\square G_{2}}=\sigma^{2}_{G_{1}}+\sigma^{2}_{G_{2}}$ . Bobkov, Houdré, and Tetali [6] showed that if $G=K_{\ell}$ and $\ell$ is even, then $\sigma_{G}^{2}=1/4$ . If $\ell$ is odd, then $\sigma_{G}^{-2}=2\ell\log\left(\frac{\ell+1}{\ell-1}\right)$ . Exact values for $\sigma$ are also known for paths and cycles of even length [35]. Bobkov, Houdré, and Tetali conjectured a characterization of the optimal functions for odd cycles [6], which was repeated by Sammer and Tetali [35].

Conjecture 1.5 ([6]).

If $X$ is an optimal variable over the cycle $C_{n}$ , then there exists an $x_{0}\in V(C_{n})$ such that $|X(v)-X(v_{0})|=d(x_{0},v)$ for all $v\in V(C_{n})$ .

We present several structural statements about optimal variables, such as the analogue of Lemma 2.10. Theorem 1.3 does not hold for optimal variables, but it does hold for “half” of the graph.

Corollary 2.15 * If $X$ is an optimal function and $X(u)<\mathbb{E}(X)$ , then $\nu_{X}-X(u)=d(u,\mathbb{O}(X))$ . *

Once again we are able to use our statements describing optimal functions to characterize the optimal functions of specific examples.

Theorem 2.9 * Conjecture 1.5 is true. *

All of our discussion so far generalizes in the obvious way to metric spaces with finite number of elements. An example is the symmetric group $S_{n}$ on $n$ elements equipped with the Hamming distance, which is $d(\sigma,\sigma^{\prime})=|\{i:\sigma(i)\neq\sigma^{\prime}(i)\}|$ . Bobkov, Houdré, and Tetali [6] bounded $\sigma_{S_{n}}$ .

Theorem 1.6 ([6]).

Let $S_{n}$ be the symmetric group on $n$ elements equipped with the Hamming distance $d_{H}$ . The subgaussian constant for this space satisfies $\frac{1}{16}(n-1)\leq\sigma_{S_{n}}^{2}\leq n-1$ .

We are interested in Olivier and Villani’s [32] use of concentration inequalities. Their concentration inequalities are applied to functions whose domain is a subset of the hypercube $\{0,1\}^{n}$ with $\ell_{1}$ distance, and thus standard statements about the hypercube will not apply. Fortunately, these subsets are closed under the group action by the symmetric group. Using this property, concentration inequalities for this unusual domain can be established (with small, but non-trivial work) from concentration inequalities on functions of the symmetric group equipped with Hamming distance.

Thus, we are interested in the bounds in Theorem 1.6. In Remark 2.3 we show that the upper bound is not sharp, although our improvement is negligible. In Theorem 2.4 we show that $\sigma_{S_{n}}^{2}>n/4$ .

Let $C_{r}=\{A\in\{0,1\}^{n}:|A|=r\}$ and $C_{r_{1},r_{2},\ldots,r_{k}}=\cup_{i}C_{r_{i}}$ , equipped with a distance metric equal to the order of the symmetric difference. Ollivier and Villani [32] established that $\sigma_{C_{n/2}}^{2}\leq n-1$ and $\sigma_{C_{(n-1)/2,(n+1)/2}}^{2}\leq n$ , which we wish to generalize.

*Theorem 2.5 and Theorem 2.6 ** * For $n\geq 3$ , we have that $\sigma_{C_{(n-r)/2,(n+r)/2}}^{2}<n-1+r^{2}/4$ . Additionally suppose $c\geq 2$ , $\frac{c-1}{2(c+1)}n>R>\sqrt{n\ln(\frac{c}{c-1})}$ , and $|n/2-r_{i}|\leq R$ for all $i$ . Let $X_{*}$ be a Lipschitz function over $\{0,1\}^{n}$ , and let $X$ be $X_{*}$ induced on $C_{r_{1},\ldots,r_{k}}$ .

[TABLE]

1.2 Background: graph curvature

For sets $S,T\subset\mathbb{R}^{d}$ and $c\in\mathbb{R}$ , let $S+T=\{s+t:s\in S,t\in R\}$ and $cS=\{cs:s\in S\}$ . Understanding the extremal properties of $S+T$ is a central goal in algebraic combinatorics. For example, Roth’s theorem states that when $A\subset\mathbb{Z}$ and $A$ is disjoint from $(A+A)/2$ , then $A$ has density [math]. Let $m(S,T)=(S+T)/2$ .

One method to bound the size of $S+T$ comes from the Brunn-Minkowski inequality, which states that $V(S+T)^{1/d}\geq V(S)^{1/d}+V(T)^{1/d},$ where $V(S)$ denotes the volume of measurable set $S$ . Using that $V(cS)=c^{d}V(S)$ and that $(a+b)/2\geq\sqrt{ab}$ for positive $a,b$ , the Brunn-Minkowski inequality transforms into $V(m(S,T))\geq\sqrt{V(S)V(T)}.$ In other words, the volume of the midpoints between $S$ and $T$ is at least the geometric average of $V(S)$ and $V(T)$ . This statement can be generalized to weighted geometric averages: if $0<\rho<1$ , define $m_{\rho}(S,T)=\rho S+(1-\rho)T$ , and we have that (see [10]) $V(m_{\rho}(S,T))\geq V(S)^{\rho}V(T)^{1-\rho}.$

We define the distance between sets $S,T$ to be $d_{*}(S,T)=\min\{d(s,t):s\in S,t\in T\}$ . For $\mathbb{R}^{n}$ , the value $d_{*}(S,T)$ has no affect on midpoints, as $S+(\{u\}+T)=\{u\}+(S+T)$ . However, for smooth complete manifolds with positive curvature $K$ , the Brunn-Minkowski inequality strengthens (see [32]) exponentially as $d_{*}(S,T)$ grows: $V(m(S,T))\geq\sqrt{V(S)V(T)}e^{\frac{K}{8}d_{*}(S,T)^{2}}.$ Should a value $K$ hold for some space, we will call the supremum of such values the Brunn-Minkowski curvature.

There have been several attempts to generalize the Brunn-Minkowski inequality to discrete spaces such as $\mathbb{Z}^{d}$ . These efforts have met several obstacles. The most obvious obstacle is that the volume function naturally voids sets $S$ whose dimension is less than the overall space. The discrete analogue of volume is to count points, but this function does not naturally void sets with smaller dimensionality. As such, the strongest statement possible is that $|S+T|\geq|S|+|T|$ , and this is sharp for any dimension.

Some progress can be made when we force the sets $S$ and $T$ to live in higher dimensions. Ruzsa [34] proved that for $S,T\subset\mathbb{Z}^{d}$ with $\dim(S+T)=d$ and $|S|\geq|T|$ , then $|S+T|\geq|S|+d|T|-{d+1\choose 2}$ . Gardner and Gronchi [21] proved that for $S,T\subset\mathbb{Z}^{d}$ with $dim(T)=d$ and $|S|\geq|T|$ , then $|S+T|^{1/d}\geq|S|^{1/d}+\left(\frac{|T|-d}{d!}\right)^{1/d}$ .

We consider the question posed verbally by Stroock and written down by Ollivier and Villani (see [32]), “what is the curvature of the discrete hypercube?” We consider the more general question of what graphs display the qualities of a curved space. We limit ourselves here to discrete analogues of positive curvature; see [38] for a discussion on discrete analogues of negative curvature. Proposed notions of positive curvature for graphs include coarse Ricci curvature [30, 31], Bakry-Emery version of Ricci curvature [5], dispersion of heat [20], displacement convexity with approximate midpoints [10, 9], displacement convexity with Gaussian midpoints [22], among others. (We briefly mention that “displacement convexity with approximate midpoints” is a misleading name, as it allows for midpoints of quasi-geodesics.) We will focus on Brunn-Minkowski curvature and displacement convexity as proposed in [32].

The symmetric discrete midpoints are

[TABLE]

where $\widehat{m}(a,b)=\widehat{m}_{1/2}(a,b)$ , and the discrete analogue of Brunn-Minkowski curvature $K$ is that

[TABLE]

Roughly speaking, displacement convexity implies that the midpoints of an optimal transportation from $A$ to $B$ should satisfy a similar inequality. There are several definitions of displacement convexity, and we will wait until Section 3.3 to give a formal definition.

The coarse Ricci curvature of the discrete hypercube in $d$ dimensions is easy to calculate as $\frac{1}{2d}$ , and Ollivier and Villani [32] demonstrated that the $d$ -dimensional hypercube has Brunn-Minkowski curvature at least $\frac{1}{2d}$ . But it is a coincidence that these are the same number; in Proposition 3.19 we show that there exists a $\delta>0$ such that the $d$ -dimensional hypercube has Brunn-Minkowski curvature at least $\frac{1}{2d}(1+\delta)$ when $d$ is large enough.

The discrete hypercube is unique in that it is a product space where the $\ell_{p}$ topologies are equivalent for all $p\in[0,\infty)$ . With the following result, we show that the Brunn-Minkowski curvature found by Ollivier and Villani is due to the $\ell_{0}$ topology; the difficulties of establishing a Brunn-Minkowski inequality for the integer lattice gives insight into why the other product measures are insufficient. We omit the proof to Theorem 1.7 as the modifications necessary to the proof of Ollivier and Villani’s result are contained in the proof to Theorem 3.5, which we will state soon.

Theorem 1.7.

Any finite product space with dimension $d$ equipped with the $\ell_{0}$ metric has Brunn-Minkowski curvature at least $\frac{1}{2d}$ .

Ollivier and Villani’s [32] work stated many open questions. The largest of which is to understand the curvature of the discrete hypercube with a discrete analogue of displacement convexity. They also desire a connection between the different discrete analogues of curvature; explicitly stating that “the relationship between coarse Ricci curvature and displacement convexity of entropy is unclear, and no implication has been proved or disproved in either direction as far as we know.”

The discrete hypercube has been singled out because authorities on the subject [24, 32] have labeled it as the best candidate for being the first proven example of a positively curved discrete space. The discrete hypercube was proven by Erbar and Maas [20] to have positive heat-dispersion-curvature. Our first result is a rejection of the connection between coarse Ricci curvature and displacement convexity, and we do so on the discrete hypercube, which additionally implies a disagreement between displacement convexity and heat dispersion.

Example 3.13 * There exists sets in the discrete hypercube such that the entropy of the midpoints is less than the entropy of the endpoints. *

Example 3.13 is specifically a reference to standard displacement convexity, which we distinguish from weak displacement convexity. Weak displacement convexity seems quite general, and it seems like it should hold for most spaces that have Brunn-Minkowski curvature (we note that half of our lemmas towards Theorem 3.18 are not specific to the hypercube). Such a result would imply positive curvature for more than just the discrete hypercube, as Neeranartvong, Novak, and Sothanaphan [29] proved that Brunn-Minkowski curvature also exists in the symmetric group. While such a result eludes us, we are able to present the following progress towards proving weak displacement convexity of the hypercube.

Theorem 3.18 * For probability distributions $\mu_{A}$ , $\mu_{B}$ over points in the $d$ -dimensional discrete hypercube, there exists an optimal transportation $\tau$ such that the distribution $\mu_{c}$ over the midpoints $\widetilde{m}(\mu_{A},\mu_{B})$ satisfies*

[TABLE]

The other open questions from Ollivier and Villani involve bounding $\widehat{m}_{\rho}(A,B)$ for $\rho\neq 1/2$ . The first approach suggested is to bound $|\widehat{m}_{1/4}(A,B)|$ by bounding $|\widehat{m}(\widehat{m}(A,B),B)|$ . Such a relationship would only exist in Riemannian spaces for convex $A$ and $B$ , and so Ollivier and Villani suggest the condition $\widehat{m}(A,A)\subseteq A$ , $\widehat{m}(B,B)\subseteq B$ as the discrete analogue of convexity. This condition is not sufficient. To fully explore this line of thinking, we characterize the family of convex sets in the discrete hypercube.

Theorem 3.2 * If $S$ is a convex subset of $\mathbb{H}_{d}$ , then $S$ is isomorphic to $\mathbb{H}_{d^{\prime}}$ for $d^{\prime}\leq d$ . *

Our characterization proves that Example 3.1, which has vertex sets $A,B$ such that $\widehat{m}(\widehat{m}(A,B),B)\supsetneq\widehat{m}_{1/4}(A,B)$ , is general. This characterization also leads to interesting results that highlight how non-intuitive discrete spaces are.

Corollary 3.3 * The convex closure of any non-trivial ball in a hypercube is the whole space. *

Corollary 3.4 * If $A,B$ are nonempty sets of vertices in the hypercube and $C$ is the convex closure of $\widehat{m}(A,B)$ , then $A\cup B\subseteq C$ . *

We are able to use our advances in concentration theory to bound $\widehat{m}_{\rho}(A,B)$ when $|1/2-\rho|$ is small. Our bound still uses $\sqrt{|S||T|}$ instead of $|S|^{\rho}|T|^{1-\rho}$ , which implies that our technique is not preferable.

Theorem 3.5 * Suppose $S,T\subseteq G=K_{r_{1}}\square K_{r_{2}}\square\cdots\square K_{r_{d}}$ such that $d_{*}(S,T)\geq\delta d$ . Let $\rho$ be such that $|1/2-\rho|<\epsilon$ and $C=\delta^{2}-16\ln(2)\epsilon>0$ . Under these conditions, for large $d$ we have that*

[TABLE]

Bounding $|\widehat{m}(\widehat{m}(A,B),B)|$ is useful for a strong version of displacement convexity. Graphs that satisfy such a property also satisfy a condition that is similar, but stronger, than being claw-free. Unfortunately, we characterize this family of graphs in Theorem 3.20, and it is a very small family of spaces.

1.3 Common theme

The obvious connection between the two halves of this manuscript is that the second half uses a technical result from the first half. However, there is a stronger intuitive link. While most of the space of the second half of the paper is spent discussing “what does it mean to be curved?” the intent is to discuss “who is curved?” From that perspective, we propose graphs with small variance—the graphs we study in the first half of the paper—as the model for a positively curved space.

Let us quickly survey the evidence for this. Lipschitz functions frequently appear as minor statements in manuscripts about curved spaces. For example, in Particular Case 5.4 of [37], Villani presents the dual Kantorovich problem (which we elaborate on in Remark 3.12) as maximizing a variance-like parameter over the set of Lipschitz functions; and Lipschitz functions show up elsewhere in the textbook, such as in Remark 6.5. The heat dissipation approach to curvature from Erbar and Maas [20] implies a bound on the subgaussian constant.

The spread can be defined $c^{2}(G)=\frac{1}{|V(G)|}\max_{f}\sum_{u\in V(G)}f(u)^{2}$ subject to the constraints $0=\sum_{u\in V(G)}f(u)$ and for each $vw\in E(G)$ we have that $(f(v)-f(w))^{2}\leq 1$ . The Fiedler vector $h$ of the Laplacian is known to satisfy $\frac{|E(G)|}{\lambda}=\max_{h}\sum_{u\in V(G)}h(u)^{2}$ subject to the constraints $0=\sum_{u\in V(G)}h(u)$ and $\sum_{vw\in E(G)}(h(v)-h(w))^{2}\leq|E(G)|$ , where $\lambda$ is the associated eigenvalue. Alon and Milman [3] were the first to observe that concentrated graphs are a generalization of expanders. Bauer, Chung, Lin, and Liu [5] showed that each of coarse Ricci curvature and Bakry-Emery version of Ricci curvature implies a bounded value for $\lambda$ .

Thus the literature is heavy with results that indicate discrete curvature is a tool that can be used to find concentrated discrete spaces. But we know far more concentrated discrete spaces than curved discrete spaces; so can we use techniques from expanders to study curvature? We explore the prospects of such research in Section 3.6. In continuous spaces, the statements also use curvature to imply concentration and not vice-versa (see chapter 22 of [37]), but the examples contained in this manuscript suggest that any applicable discrete curvature will necessarily be weaker than the continuous analogue.

2 Concentration Of Lipschitz Functions

2.1 Background

The following is basically the details behind the relevant portions of the first page and a half of [6]. We assume all variables are real.

Recall (1). Because $X$ is real, the exponential function is positive and monotone increasing, and therefore $e^{th}\mathbb{P}(X-\mathbb{E}X\geq h)\leq\mathbb{E}e^{t(X-\mathbb{E}X)}$ when $t$ is nonnegative. By setting $t=h\sigma^{-2}\geq 0$ we have that (1) implies

[TABLE]

Let $m(X)$ be the median value of $X$ , and so $y=m(X)$ minimizes the function $\mathbb{E}|X-y|$ . In particular, we have that $\mathbb{E}|X-m(X)|\leq\mathbb{E}|X-\mathbb{E}(X)|$ . We apply the Cauchy-Schwartz inequality (let $Y_{e}$ be the possible events for variable $|X-\mathbb{E}(X)|$ with probability $p_{e}$ , set $a_{e}=Y_{e}\sqrt{p_{e}}$ and $b_{e}=\sqrt{p_{e}}$ ) to say that $\left(\mathbb{E}|X-\mathbb{E}(X)|\right)^{2}\leq\operatorname{Var}(X)$ . We conclude that

[TABLE]

We may apply (2) with $h=h^{\prime}-\sqrt{\operatorname{Var}(X)}$ and use (3) to see that

[TABLE]

Claim 2.1 will transform (4) into

[TABLE]

Claim 2.1.

$\operatorname{Var}(X)\leq\sigma^{2}$ **

Proof.

Consider the expression $Y_{t}=\mathbb{E}\left(2\left(e^{t(X-\mathbb{E}X)}-1\right)t^{-2}\right)$ as $t\rightarrow 0$ . Apply the Taylor sequence of $e^{t(X-\mathbb{E}X)}$ to see that

[TABLE]

Because expectation is linear and $\mathbb{E}\left(X-\mathbb{E}(X)\right)=0$ , we have that $\operatorname{Var}(X)=\lim_{t\rightarrow 0}Y_{t}$ . Now apply (1) to see that $Y_{t}\leq 2\left(e^{\sigma^{2}t^{2}/2}-1\right)t^{-2}$ . By the Taylor sequence for $e^{\sigma^{2}t^{2}/2}$ we have that $\lim_{t\rightarrow 0}2\left(e^{\sigma^{2}t^{2}/2}-1\right)t^{-2}=\sigma^{2}$ . ∎

Bobkov, Houdré, and Tetali [6] had a special interest in random variables $X_{a}=f(a)$ , where $f$ is Lipschitz over some space $S$ . In particular, they were interested in bounding the size of the space $A^{-h}=\{x:d(x,A)\geq h\}$ . They use variable $X_{a}=f(a)=d(a,A)$ for subspaces $A$ such that $\mathbb{P}(x\in A)\geq 1/2$ , which implies that $m(X)=0$ . With these choices, (5) implies that $\mathbb{P}(x\in A^{-h})\leq e^{-(h/\sigma-1)^{2}/2}$ when $h\geq\sigma$ . If we choose $f(x)=\min\{h,d(x,A)\}$ , then the same choices imply $\mathbb{E}(X)\leq h/2$ , and so (2) gives a better bound when $h<2\sigma$ : $\mathbb{P}(x\in A^{-h})\leq e^{-(h/\sigma)^{2}/8}$ .

The subgaussian constant for $X$ is a value for $\sigma_{X}^{2}$ such that (1) is true for all real $t$ . For a probability space $S$ , $\sigma_{S}^{2}$ is the supremum of $\sigma_{f}^{2}$ , where $f$ is over all Lipschitz functions over $S$ . When it is clear, we drop the subscript from $\sigma$ . With all of these amazing inequalities, all we have left to do is find distributions $X$ and spaces $S$ with reasonable values for $\sigma$ .

Suppose $C$ is any finite set, and our space is $C^{n}$ equipped with the Hamming distance ( $d_{H}(x,y)=|\{i:x_{i}\neq y_{i}\}|$ ). If $X_{a}=f(a)$ for Lipschitz function $f$ over $x\in C^{n}$ , then McDiarmid’s inequality implies $\sigma^{2}\leq n/4$ in (2). Bobkov, Houdré, and Tetali [6] showed that if $|C|$ is even (including the already-known case of $|C|=2$ ), then $\sigma^{2}=n/4$ . If $|C|$ is odd, then $n\sigma^{-2}=2|C|\log\left(\frac{|C|+1}{|C|-1}\right)$ . The symmetric group on $n$ elements is a subset of $C^{n}$ for $|C|=n$ . Bobkov, Houdré, and Tetali [6] showed that if our space is the symmetric group with a distance inherited from the Hamming distance on $C^{n}$ in this manner, then $\sigma^{2}\leq n-1$ .

We are also interested in probability spaces $X=f\circ\mu$ defined as uniform probabilities $\mu$ over the vertex set of a graph $G$ equipped with functions $f:V(G)\rightarrow\mathbb{R}$ that are Lipschitz with respect to the standard graph distance in $G$ . Alon, Boppana, and Spencer [2] demonstrated that the subgaussian constant tensors out with respect to the Cartesian product. As this result will be important to us (and because the proof is so short and elegant), we include it here for completeness.

First, let us establish notation that we will use again later. For a graph $G$ , let $\Omega_{G}$ be the set of Lipschitz functions over $G$ . For $f\in\Omega_{G}$ , the log-moment function of $f$ is $L_{f}(t)=\ln(\mathbb{E}(e^{t(f-\mathbb{E}(f))}))$ . The log-moment function of $G$ is $L_{G}(t)=\max_{f\in\Omega}L_{f}(t)$ . (Let us quickly explain why we use $\max$ instead of $\sup$ : the log-moment function is invariant under translation— $L_{f}(t)=L_{f+c}(t)$ —and therefore we may restrict ourselves to functions where $f(v)=0$ for arbitrary fixed vertex $v$ . This restricted subset of $\Omega$ is a closed bounded subset of $\mathbb{R}^{n}$ and therefore compact.) We can thus calculate the subgaussian constant as $\sigma_{G}^{2}=\sup_{t>0}2L_{G}(t)t^{-2}$ .

Theorem 2.2 ([2]).

For graphs $G$ and $H$ , we have that $\sigma_{G\square H}^{2}=\sigma_{G}^{2}+\sigma_{H}^{2}$ .

Proof.

Fix a $t$ .

Let $f,f^{\prime}$ be Lipschitz functions over $G,H$ such that $L_{G}(t)=L_{f}(t),L_{H}(t)=L_{f^{\prime}}(t)$ respectively. Now define $\ell$ to be the function $\ell(u,v)=f(u)+f^{\prime}(v)$ over $G\square H$ , which is Lipschitz by choice of $f,f^{\prime}$ . By translation invariance, let us assume that $\mathbb{E}(f)=\mathbb{E}(f^{\prime})=0$ , which implies that $\mathbb{E}(\ell)=0$ as well. Notice that

[TABLE]

As this holds for each $t$ , we conclude that $\sigma_{G\square H}^{2}\geq\sigma_{G}^{2}+\sigma_{H}^{2}$ .

Now suppose that $\ell$ is a Lipschitz function over $G\square H$ such that $L_{G\square H}(t)=L_{\ell}(t)$ . Let $f(u)=\mathbb{E}_{v}(\ell(u,v))$ , and define a family of functions over $H$ as $f^{\prime}_{u}$ such that $\ell(u,v)=f(u)+f^{\prime}_{u}(v)$ . Because $\ell$ is Lipschitz and expectation is linear, we have that $f$ is Lipschitz. For a fixed $u$ , we have that $f(u)$ is constant, and therefore $f_{u}^{\prime}$ is also Lipschitz. The expectation of $f$ over $u$ is the expectation of $\ell$ over $(u,v)$ . For simplicity, we will use translation invariance of the log-moment function to assume $\ell$ has expectation [math], and therefore $f$ has expectation [math] as well. By definition of $f$ , it follows that $f^{\prime}_{u}$ has expectation [math] for each $u\in V(G)$ . The theorem then follows from the monotonicity of the exponential function and that for all $t$ ,

[TABLE]

∎

2.2 Concentration of permutations and levels of the Boolean lattice

We are interested in Ollivier and Villani’s [32] use of concentration inequalities. They use a result by Bobkov, Houdré, and Tetali [6] on the concentration of $S_{n}$ as a lemma, and hence we wish to establish this result with great care. The proof of this statement is omitted from the published paper, but its proof is available in an extended manuscript available on a personal website. We include the proof here for self-containment and to establish the context necessary for Remark 2.3.

**Theorem 1.6 ** ([6]) * Let $S_{n}$ be the symmetric group on $n$ elements equipped with the Hamming distance $d_{H}$ . The subgaussian constant for this space satisfies $\sigma_{S_{n}}^{2}\leq n-1$ . *

Proof.

We prove this by induction on $n$ . The base case follows from $|S_{1}|=1$ . Also, $S_{2}\cong K_{2}$ except with distances doubled, and therefore $\sigma_{S_{2}}^{2}=4\sigma_{K_{2}}^{2}=1$ . Next, we will show that $\sigma^{2}_{S_{n}}\leq\sigma^{2}_{S_{n-1}}+1$ .

Let $T_{i}=\{\sigma\in S_{n}:\sigma(1)=i\}$ , so that $T_{1},\ldots,T_{n}$ partition $S_{n}$ and each $T_{i}$ is isomorphic to $S_{n-1}$ . Recall the log-moment function $L_{S_{n}}(t)$ and let $f$ be a function that is Lipschitz over $S_{n}$ and $L_{f}(t)=L_{S_{n}}(t)$ . Let $f_{i}$ denote $f$ restricted to the domain of $T_{i}$ . We introduce the translated log-moment function $L_{f}^{*}(t)=\ln(\mathbb{E}(e^{tf}))$ , so that $e^{L_{f}(t)}=e^{L_{f}^{*}(t)}e^{-t\mathbb{E}f}$ . It follows that

[TABLE]

where the second-to-last part follows by induction. We will show that $\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}e^{t(\mathbb{E}f_{i}-\mathbb{E}f)}\leq e^{t^{2}/2}$ , which will prove the theorem.

Let $g$ be a variable such that $g(i)=\mathbb{E}f_{i}$ . Note that $\mathbb{E}(g)=\mathbb{E}(f)$ , and therefore $e^{L_{g}(t)}=\frac{1}{n}\sum_{i=1}^{n}e^{t(\mathbb{E}f_{i}-\mathbb{E}f)}$ . So we are trying to prove that $\sigma_{g}^{2}\leq 1$ . We will show that for all $i,j$ we have that $|g(i)-g(j)|\leq 2$ , and therefore $\sigma_{g}^{2}\leq 4\sigma_{K_{n}}^{2}\leq 1$ , which will prove the theorem.

To show that $|\mathbb{E}_{x\in T_{i}}(f(x))-\mathbb{E}_{x\in T_{j}}(f(x))|\leq 2$ for $i\neq j$ we establish a bijection $R_{i,j}:T_{i}\rightarrow T_{j}$ such that $d_{H}(x,R_{i,j}(x))\leq 2$ for all $x\in T_{i}$ . For a fixed $x\in T_{i}$ , let $k=x^{-1}(j)$ . Let $y=R_{i,j}(x)$ , and define $y(1)=j$ , $y(k)=i$ , and $y(\ell)=x(\ell)$ otherwise. ∎

Remark 2.3.

Theorem 1.6 is not sharp because $4\sigma_{K_{n}}^{2}<1$ whenever $n$ is odd. Because the proof is inductive, the bound on $\sigma_{S_{n}}^{2}$ could be improved by the sum of $1-4\sigma_{K_{m}}^{2}$ over the odd $m\leq n$ . Unfortunately, $1-4\sigma_{K_{m}}^{2}\leq O(m^{-2})$ , and thus this improvement is finite.

Let $J_{n}=K_{n}\square K_{n-1}\square\cdots\square K_{2}$ . Recall that Bobkov, Houdré, and Tetali [6] gave the exact subgaussian constant for the complete graph, and thus Theorem 2.2 implies that

[TABLE]

The extended manuscript of Bobkov, Houdré, and Tetali [6] includes a proof from Schechtman that $\sigma_{S_{n}}^{2}\leq 9\sigma_{J_{n}}^{2}$ by constructing a $3$ -Lipschitz function $T:J_{n}\rightarrow S_{n}$ . It can be observed that $T^{-1}$ is $2$ -Lipschitz, and therefore $\sigma_{S_{n}}^{2}\geq\sigma_{J_{n}}^{2}/4$ .

We know the extremal function for the log-moment function of $J_{n}$ . By the proof of Theorem 2.2 the extremal function of $J_{n}$ is the extremal function of $K_{i}$ tensored out for $2\leq i\leq n$ , and Bobkov, Houdré, and Tetali [6] gave the extremal function for $K_{i}$ . Up to symmetry, the extremal function for $J_{n}$ is then $f(x_{1},\ldots,x_{n-1})=|\{i:2x_{i}>n+1-i\}|$ .

We can construct close lower bounds on $\sigma_{S_{n}}$ by using similar functions. That is, we consider functions $f$ that are sum of indicator variables with balanced probabilities. Bobkov, Houdré, and Tetali [6] show that $\sigma_{S_{n}}^{2}>(n-1)/16$ using the function $f(\pi)=|\{i:i\leq\lfloor n/2\rfloor,\pi(i)>\lfloor n/2\rfloor\}|$ , which can easily be expressed as the sum of $\lfloor n/2\rfloor$ indicator variables. Recall for $X=\sum_{i=1}^{n}X_{i}$ that $\operatorname{Var}(X)=\sum_{i=1}^{n}\operatorname{Var}(X_{i})+\sum_{i\neq j}\operatorname{Cov}(X_{i},X_{j})$ . We improve the previous lower bound on $\sigma_{S_{n}}$ by finding indicator variables with positive covariance.

Theorem 2.4.

Let $S_{n}$ be the symmetric group on $n\geq 3$ elements equipped with the Hamming distance $d_{H}$ . The subgaussian constant for this space satisfies $\sigma_{S_{n}}^{2}>\frac{n}{4}>\sigma_{J_{n}}^{2}+\frac{1}{4}$ .

Proof.

For a permutation $\pi\in S_{n}$ , let $X_{i}$ be the indicator variable that $\pi(i)>n/2$ . Bobkov, Houdré, and Tetali’s function is $f(\pi)=\sum_{i=1}^{\lfloor n/2\rfloor}X_{i}(\pi)$ . They demonstrated that $\operatorname{Var}(f)>(n-1)/16$ , and so their result follows from Theorem 2.1. Moreover, if $n$ is even, then $\operatorname{Var}(f)>n/16$ .

Consider the function $X=\sum_{i=1}^{\lfloor n/2\rfloor}X_{i}+\sum_{\lfloor n/2\rfloor+1}^{n}(1-X_{i})$ . If $n$ is even, then because $\pi$ is a permutation, $\sum_{i=1}^{n/2}X_{i}=\sum_{n/2+1}^{n}(1-X_{i})$ . So when $n$ is even, $X(\pi)=2f(\pi)$ , and so $\operatorname{Var}(X)=4\operatorname{Var}(f)>n/4$ .

Now suppose $n=2k+1$ for integer $k\geq 1$ . A direct calculation gives us that $\mathbb{E}(X_{i})=\frac{k+1}{2k+1}$ and $\operatorname{Var}(X_{i})=\operatorname{Var}(1-X_{i})=\frac{k(k+1)}{(2k+1)^{2}}$ . Moreover, for $i\neq j$ , $\operatorname{Cov}(1-X_{i},1-X_{j})=\operatorname{Cov}(X_{i},X_{j})=-\frac{k+1}{2(2k+1)^{2}}$ and $\operatorname{Cov}(X_{i},1-X_{j})=-\operatorname{Cov}(X_{i},X_{j})$ . Therefore

[TABLE]

∎

Let $C_{r}=\{A\in\{0,1\}^{n}:|A|=r\}$ and $C_{r_{1},r_{2},\ldots,r_{k}}=\cup_{i}C_{r_{i}}$ . Ollivier and Villani [32] established that $\sigma_{C_{n/2}}^{2}\leq n-1$ and $\sigma_{C_{(n-1)/2,(n+1)/2}}^{2}\leq n$ , which we wish to generalize.

Theorem 2.5.

For $n\geq 3$ , $\sigma_{C_{(n-r)/2,(n+r)/2}}^{2}<n-1+r^{2}/4$

Proof.

Let $f$ be a Lipschitz function on $C_{(n-r)/2,(n+r)/2}$ . Let $H=rK_{2}\square S_{n}$ , where $rK_{2}$ is two points $\{0,r\}$ distance $r$ apart. We define a surjection $\psi:H\rightarrow C_{(n-r)/2,(n+r)/2}$ as follows: $\psi(0,\pi)=f(\pi(\{1,2,\ldots,(n-r)/2\}))$ and $\psi(r,\pi)=f(\pi(\{1,2,\ldots,(n+r)/2\}))$ . Let $g:H\rightarrow\mathbb{R}$ such that $g(h)=f(\psi(h))$ . Because ${n\choose(n-r)/2}={n\choose(n+r)/2}$ , the pre-image under $\psi$ of any point in $C_{(n-r)/2,(n+r)/2}$ is of fixed size. Therefore $\mathbb{E}(e^{t(g-\mathbb{E}g)})=\mathbb{E}(e^{t(f-\mathbb{E}f)})$ .

We claim the following two facts: (A) $g$ is Lipschitz over $H$ and (B) $\sigma_{H}^{2}<n-1+r^{2}/4$ . The claim implies that $\mathbb{E}(e^{t(g-\mathbb{E}g)})<e^{t^{2}(n-1+r^{2}/4)/2}$ , which proves the theorem.

By the definition of Cartesian product, $g$ is Lipschitz if and only if it is Lipschitz in each coordinate. Because $f$ is Lipschitz and we apply the Hamming distance to $S_{n}$ , we have that $|g(x,\pi)-g(x,\pi^{\prime})|\leq d_{S_{n}}(\pi,\pi^{\prime})$ . Because $\psi(0,\pi)\subseteq\psi(r,\pi)$ , $|\psi(r,\pi)-\psi(0,\pi)|=r$ , and $f$ is Lipschitz, we have that $|g(x,\pi)-g(x^{\prime},\pi)|\leq r$ . Thus, $g$ is Lipschitz. This proves (A).

By Theorem 2.2, $\sigma_{H}^{2}=\sigma_{S_{n}}^{2}+\sigma_{rK_{2}}^{2}$ . By Remark 2.3, $\sigma_{S_{n}}^{2}<n-1$ . By the definition of subgaussian constant, $\sigma_{rK_{2}}^{2}=r^{2}\sigma_{K_{2}}^{2}$ . Recall that $\sigma_{K_{2}}^{2}=1/4$ . This proves (B). ∎

Theorem 2.5 is strongest when $r\leq O(\sqrt{n})$ . We will need another result for when $r=\epsilon n$ for small $\epsilon>0$ . We restrict our attention to the midpoints of distant sets of vertices, and only establish an extension of (2).

In the following we will need to bound from below the value $|C_{R}|={n\choose R}$ for $R=n/2-r$ and $\sqrt{n}\ll r$ . The standard inequality to use is $(\frac{n}{R})^{R}\leq{n\choose R}$ , which gives $|C_{R}|\geq(2+O(\epsilon))^{n/2-r}$ when $r=\epsilon n$ . We need (and produce) the stronger inequality $|C_{R}|\geq 2^{n-O(r)}$ .

Theorem 2.6.

Suppose $c\geq 2$ , $n\geq 3$ , $\frac{c-1}{2(c+1)}n>R>\sqrt{n\ln(\frac{c}{c-1})}$ , and $|n/2-r_{i}|\leq R$ for all $i$ . Let $X_{*}$ be a Lipschitz function over $\{0,1\}^{n}$ , and let $X$ be $X_{*}$ induced on $C_{r_{1},\ldots,r_{k}}$ .

[TABLE]

In particular, if $n/6>R>\sqrt{n\ln(2)}$ , then

[TABLE]

Proof.

Because $C_{r_{1},\ldots,r_{k}}\subset\{0,1\}^{n}$ , every event $X-\mathbb{E}X_{*}\geq h$ is also an event $X_{*}-\mathbb{E}X_{*}\geq h$ . If we wish to count such events, this inequality becomes

[TABLE]

By Remark 2.3, inequality (2) applied to $X_{*}$ is

[TABLE]

Let $C=C_{n/2-R,\ldots,n/2+R}$ . A standard inequality tells us that $|C|2^{-n}\geq 1-e^{-R^{2}/n}$ , and by the assumption $R>\sqrt{n\ln(\frac{c}{c-1})}$ we have that $|C|>2^{n}/c$ . We will show that $|C_{n/2-R}|\geq c^{-R-2}|C|$ , which will imply that $|C_{r_{1},\ldots,r_{k}}|\geq kc^{-R-2}|C|$ , which implies the theorem.

Because $\frac{c-1}{2(c+1)}n>R$ and $i\geq 0$ we have that $\frac{|C_{n/2-R+i+1}|}{|C_{n/2-R+i}|}=\frac{n/2+R-i}{n/2-R+i+1}<c$ . Therefore $|C_{n/2-R+i}|<c^{i}|C_{n/2-R}|$ , and because $c\geq 2$ this implies

[TABLE]

∎

In the end, our concentration of measure will be used in the following manner.

Theorem 2.7.

Let $A,B\subseteq\sigma_{C_{(k-r)/2,(k+r)/2}}^{2}\subset\{0,1\}^{k}$ such that $|A|\leq|B|$ and $\min_{a\in A,b\in B}d(a,b)\geq t$ . Under these circumstances,

[TABLE]

If $r=\epsilon k$ and $t=\delta k$ with fixed coefficients that satisfy $C=\delta^{2}-8\ln(2)\epsilon>0$ , then

[TABLE]

Proof.

Let $X_{*}$ be a variable over $\{0,1\}^{k}$ such that $X_{*}(u)=d(u,A)$ . By this construction, $X_{*}$ is Lipschitz, $A=\{u:X_{*}(u)\leq 0\}$ , and $B\subseteq\{u:X_{*}(u)\geq t\}$ . Let $X$ be $X_{*}$ induced on $C_{(k-r)/2,(k+r)/2}$ . If $\mathbb{E}(X)\leq t/2$ , then apply Theorem 2.5 to inequality (2) on variable $X$ to see that $|B|\leq|C_{(k-r)/2,(k+r)/2}|e^{-t^{2}/(8(k-1+r^{2}/4))}$ . If $\mathbb{E}(X)\geq t/2$ , then apply Theorem 2.5 to inequality (2) on variable $-X$ to see that $|A|\leq|C_{(k-r)/2,(k+r)/2}|e^{-t^{2}/(8(k-1+r^{2}/4))}$ . Because $|B|\geq|A|$ by assumption, the first part of the theorem follows.

The diameter of $\{0,1\}^{k}$ is $k$ , and so we may assume that $\delta\leq 1$ . If $\delta^{2}-8\ln(2)\epsilon>0$ , then $\epsilon<1/6$ . The second part of the theorem follows similarly, with Theorem 2.6 (specifically, the second inequality with $c=k=2$ ) replacing Theorem 2.5 and inequality (2).

∎

2.3 Concentration of cycles

A variable $X$ is called optimal if $L_{G}(t)=L_{X}(t)$ . Let $\Omega^{*}$ be the variables in $\Omega$ that are not a convex combination of other variables in $\Omega$ . The log-moment function is a summation of functions that are strictly convex in each variable composed with a monotone function, and therefore the optimal functions are contained by $\Omega^{*}$ . Bobkov, Houdré, and Tetali [6] established the following combinatorial fact. If $E_{X}=\{uv:uv\in E(G),|X(u)-X(v)|=1\}$ , then

[TABLE]

The log-moment function is also symmetric, and Bobkov, Houdré, and Tetali used this to show a second combinatorial fact. If $\pi$ is a permutation of $V(G)$ and $X\circ\pi\in\Omega$ , then

[TABLE]

Bobkov, Houdré, and Tetali [6] used (6) and (7) to quickly calculate the subgaussian constant of $C_{n}$ when $n$ is even. However, the proof is just short of calculating $\sigma_{G}$ for $G=C_{2k+1}$ . Up to symmetry, the cycle $C_{n}$ contains a unique spanning tree, which contains all but one edge (by symmetry, call this edge $uv$ ). Let $X\in\Omega^{*}$ , and see that $|X(u)-X(v)|\cong n+1(\operatorname{mod}2)$ . Because $X$ is Lipschitz and $uv\in E(C_{n})$ , we also know that $|X(u)-X(v)|\leq 1$ . If $n=2k$ , then $uv\in E_{X}$ , and therefore $E_{X}=E(C_{2k})$ . Moreover, if there exists three distinct vertices $i,j,k$ such that $X(i)=X(j)=X(k)$ , then there exists a permutation $\pi$ such that $X\circ\pi\in\Omega$ and $E_{X\circ\pi}$ does not contain a spanning tree (thus violating (7)). Up to symmetry, there exists a unique integer-valued Lipschitz function on $C_{2k}$ such that no value appears in the image more than twice and the endpoints of each edge differ by exactly $1$ ; and therefore $X$ is known, and therefore $\sigma_{2k}$ is known.

Conjecture 1.5, made by Bobkov, Houdré, and Tetali [6] and repeated by Sammer and Tetali [35], is that a similar construct is optimal for odd cycles. In order to enhance this argument for odd cycles, we will argue a stronger statement than (6) and (7). We will also use this stronger statement again at a later point in the paper. Let $S_{V(G)}$ denote the set of permutations on $V(G)$ and

[TABLE]

To explain $\Omega^{\circ}$ in simpler terms, we have that $\Omega^{*}$ is $\Omega$ minus functions that are convex combinations of other Lipschitz functions, and $\Omega^{\circ}$ is $\Omega$ minus functions that are convex combinations of permutations of other Lipschitz functions (which might not be Lipschitz after the permutation is applied). We do not know apriori that $\Omega^{\circ}$ is non-empty, but this follows from the following Lemma.

Lemma 2.8.

If $X$ is optimal, then $X\in\Omega^{\circ}$ .

Proof.

The log-moment function is well defined for all functions $X:V(G)\rightarrow\mathbb{R}$ , not just those inside $\Omega$ . Moreover, the log-moment function is strictly convex over this extended domain. So if $X$ is a convex combination of $Z\circ\pi_{Z}$ and $Y\circ\pi_{Y}$ , then $L_{X}(t)<\max\{L_{Z\circ\pi_{Z}}(t),L_{Y\circ\pi_{Y}}(t)\}$ . And by symmetry, $L_{Z\circ\pi_{Z}}(t)=L_{Z}(t)$ and $L_{Y\circ\pi_{Y}}(t)=L_{Y}(t)$ . ∎

Theorem 2.9.

Conjecture 1.5 is true.

Proof.

Let $V(C_{n})=\{u_{1},\ldots,u_{n}\}$ , and let the indices be taken modulo $n$ . Fix some $i$ , and let $X_{i}(w)=d(u_{i},w)$ . We will show that $\Omega^{\circ}$ is the set of translations and reflections of $X_{i}$ for $1\leq i\leq n$ . Suppose $X\in\Omega^{\circ}$ .

One method to characterize the translations and reflections of $X_{i}$ is as the family of Lipschitz functions $\{Y_{k}\}$ that satisfy the condition that for any $\ell$ ,

[TABLE]

By translation invariance, let us assume that $X$ is integer valued. We will show that $X$ satisfies (8).

Let $m=\min_{i}X(i)$ and $M=\max_{i}X(i)$ . There exists an $X^{\prime}=X\circ\pi$ for a permutation $\pi$ of $V(C_{n})$ such that

(1) for $1\leq i\leq 1+M-m$ we have that $X^{\prime}(u_{i})=m+i-1$ , and

(2) for $1+M-m\leq j\leq n$ we have that $X^{\prime}(u_{j})\geq X^{\prime}(u_{j+1})$ .

By construction, $X^{\prime}\in\Omega$ . By (7), $X^{\prime}$ is optimal, and so by (6), $X^{\prime}(u_{i})=X^{\prime}(u_{i+1})+1$ for $i\in[1+M-m,n]-\{\ell^{\prime}\}$ for some value $\ell^{\prime}$ .

If $X^{\prime}(u_{\ell^{\prime}})=X^{\prime}(u_{\ell^{\prime}+1})+1$ or $\ell^{\prime}=1+M-m$ or $\ell^{\prime}=n$ , then we are done. So assume $X^{\prime}(u_{\ell^{\prime}})=X^{\prime}(u_{\ell^{\prime}+1})$ and $2+M-m\leq\ell^{\prime}\leq n-1$ . Consider the functions $X_{*}$ and $X_{**}$ where

(A) $X_{*}(u_{i})=X^{\prime}(u_{i})$ when $i\neq\ell^{\prime}$ and $X_{*}(u_{\ell^{\prime}})=X^{\prime}(u_{\ell^{\prime}})+1$ , and

(B) $X_{**}(u_{i})=X^{\prime}(u_{i})$ when $i\neq\ell^{\prime}+1$ and $X_{**}(u_{\ell^{\prime}+1})=X^{\prime}(u_{\ell^{\prime}+1})-1$ .

By construction, $X_{*},X_{**}\in\Omega$ . Moreover, if $\pi$ is the permutation on $V(C_{n})$ that transposes $\ell^{\prime}$ with $\ell^{\prime}$ , then $X^{\prime}=\frac{1}{2}(X_{*}\circ\pi+X_{**})$ . By Lemma 2.8, this implies that $X^{\prime}$ is not optimal. ∎

2.4 The structure of the subgaussian constant and spread

Let us use Lemma 2.8 to prove a statement that will be used later. A hair of $G$ is a sequence of vertices $w_{0},w_{1},\ldots,w_{k}$ such that $N(w_{i})=\{w_{i-1},w_{i+1}\}$ for $1\leq i\leq k-1$ and $N(w_{k})=\{w_{i-1}\}$ . We may use (6) and (7) to state that

[TABLE]

That is, for hair $w_{0},\ldots,w_{k}$ and optimal function $X$ , there exists an $0\leq\ell\leq k$ such that

(A) $X(w_{i})-X(w_{i-1})=X(w_{i^{\prime}})-X(w_{i^{\prime}-1})$ for all $1\leq i,i^{\prime}\leq\ell$ , and

(B) $X(w_{j})-X(w_{j-1})=X(w_{j^{\prime}})-X(w_{j^{\prime}-1})$ for all $\ell<j,j^{\prime}\leq k$ .

We will need to find the set of optimum functions of a particular family of trees in a later section; for this purpose we will need to know that “small” hairs are monotone.

Lemma 2.10.

Let $X$ be an optimal Lipschitz variable on $G$ , and let $w_{0},w_{1},\ldots,w_{k}$ be a hair of $G$ . Let $m=\min_{i}X(w_{i})$ and $M=\max_{i}X(w_{i})$ . If $G$ has vertices $u_{1},\ldots u_{M-m-1}$ such that $X(u_{i})=m+i$ and $\{u_{1},\ldots,u_{M-m-1}\}\cap\{w_{1},\ldots,w_{k}\}=\emptyset$ , then $X(w_{i})-X(w_{i-1})=X(w_{i^{\prime}})-X(w_{i^{\prime}-1})$ for all $1\leq i,i^{\prime}\leq k$ .

Proof.

By translation invariance, let us assume that $X$ takes integral values. By the Lipschitz condition, for any path $v_{1}v_{2}\ldots v_{k}$ satisfies $\mathbb{Z}\cap[X(v_{1}),X(v_{k})]\subseteq\{X(v_{1}),X(v_{2}),\ldots,X(v_{k})\}$ . We directly conclude that the sequence $X(w_{0}),X(w_{1}),\ldots,X(w_{k})$ must contain each integer in the range $[m,M]$ . But by partitioning the hair into multiple paths, we also see that each integer in $(m,X(w_{0})]$ or in $[X(w_{0}),M)$ appears at least twice.

By symmetry, let us assume that each integer in $(m,X(w_{0})]$ appears at least twice. There exists a permutation $\pi$ such that

(A) $X(w_{\pi(i)})=X(w_{0})-i$ for $0\leq i\leq X(w_{0})-m$ , and

(B) $X(w_{\pi(i)})\leq X(w_{\pi(j)})$ when $X(w_{0})-m\leq i\leq j$ .

Let $X^{\prime}=X\circ\pi^{\prime}$ , where $\pi^{\prime}$ is a permutation on $V(G)$ that fixes vertices outside of the hair and permutes vertices inside the hair according to $\pi^{-1}$ . By construction, $X^{\prime}$ is Lipschitz, and so (7) implies that $X^{\prime}$ is optimal. Any spanning tree of $G$ must contain each edge between consecutive vertices of a hair, and (6) implies that $X(w_{\pi(X(w_{0})-m+j)})=m+j$ .

So each value in the range $(m,X(w_{0})]$ appears exactly twice, each value in the range $(X(w_{0}),M]$ appears exactly once, and the value $m$ appears exactly once.

Now let us assume that $X$ is unimodular but not monotone. By considering a sub-hair and applying symmetry, let us assume that for $i\geq 1$ we have that $X(w_{i})=X(w_{0})+i-2=m+i-1$ . Let $\ell=M-m+1$ so that the hair is vertices $w_{0},w_{1},\ldots,w_{\ell}$ . We will prove that if there exist vertices $u_{1},\ldots,u_{\ell-2}$ such that $X(u_{i})=m+i$ , then there exists Lipschitz functions $X_{+}$ and $X_{-}$ and permutations $\pi_{+}$ and $\pi_{-}$ such that $X=\frac{1}{2}(X_{+}\circ\pi_{+}+X_{-}\circ\pi_{-})$ . By Lemma 2.8, this will prove the second part of the theorem.

Let $X_{+}(v)=X(v)$ for vertices $v$ not in the hair, $X_{+}(w_{0})=X(w_{0})$ , $X_{+}(w_{\ell})=X(w_{\ell})=M$ , and $X_{+}(w_{i})=X(w_{i})+2$ for $1\leq i<\ell$ . Let $X_{-}(v)=X(v)$ for vertices $v$ not in the hair, $X_{+}(w_{0})=X(w_{0})$ , $X_{+}(w_{\ell})=X(w_{\ell})=M$ , and $X_{+}(w_{i})=X(w_{i})+2$ for $1\leq i<\ell$ . Now we will define bijections $\pi_{-}^{-1}$ and $\pi_{+}^{-1}$ on $\{w_{1},\ldots,w_{\ell},u_{1},\ldots,u_{\ell-2}\}$ such that $X(u)-1=X_{-}(\pi_{-}^{-1}(u)$ and $X(u)+1=X_{+}(\pi_{+}^{-1}(u)$ . This will suffice, as $X,X_{+},X_{-}$ are equal on all other vertices. We define

•

for $1\leq i\leq\ell-2$ , set $\pi_{-}^{-1}(w_{i})=w_{i+1}$ and $\pi_{+}^{-1}(w_{i})=u_{i}$ ,

•

set $\pi_{-}^{-1}(u_{1})=w_{1}$ and $\pi_{+}^{-1}(u_{1})=w_{1}$ ,

•

for $2\leq i\leq\ell-2$ , set $\pi_{-}^{-1}(u_{i})=u_{i-1}$ and $\pi_{+}^{-1}(w_{i})=w_{i}$ ,

•

set $\pi_{-}^{-1}(w_{\ell-1})=w_{\ell}$ and $\pi_{+}^{-1}(w_{\ell-1})=w_{\ell}$ , and

•

set $\pi_{-}^{-1}(w_{\ell})=u_{\ell-2}$ and $\pi_{+}^{-1}(w_{\ell})=w_{\ell-1}$ .

∎

Remark 2.11.

The assumption in the second part of Lemma 2.10 can be relaxed to only assuming that $u_{1}$ and $u_{\ell-2}$ exist and satisfy $X(u_{1})\leq m+1$ and $X(u_{\ell-2})\geq M-1$ . This is because the rest of the vertices can be found on a shortest path from $u_{1}$ to $u_{\ell-2}$ .

We will call a function $X$ variance-optimal if $\operatorname{Var}(X)=c^{2}(G)$ .

Remark 2.12.

Let us first note that the variance function is strictly convex and symmetric. Therefore (6), (7), Lemma 2.8, (9), and Lemma 2.10 all hold when optimal is replaced with variance-optimal. The analogue of Theorem 2.2 also follows from minor modifications to the proof.

Variance also satisfies a handful of other properties. For example, variance is even, so $X$ is variance-optimal if and only if $-X$ is variance-optimal. Next, we present a slightly stronger version of Theorem 1.3, where the improvement will be crucial to establishing Theorem 2.19.

Theorem 2.13.

If $\operatorname{Var}(X)=c^{2}(G)$ and $X(u)\geq\mathbb{E}(X)$ , then there exists $v\in N(u)$ such that $X(u)=X(v)+1$ . If $X(u)\geq\mathbb{E}(X)-0.5$ and $|V(G)|\geq 10$ , then there exists $v\in N(u)$ such that $X(u)\geq X(v)$ .

Proof.

For $0<\epsilon$ , define $X_{\epsilon}$ to be the variable $X_{\epsilon}(w)=X(w)$ when $w\neq u$ and $X_{\epsilon}(u)=X(u)+\epsilon$ . To prove the theorem, we will show that when the assumptions are violated there exists a value of $\epsilon$ such that $\operatorname{Var}(X_{\epsilon})>\operatorname{Var}(X)$ and that $X_{\epsilon}$ is Lipschitz. Recall from Theorem 1.3 that $X(v)-X(u)$ is an integer. If $\epsilon<1$ and for all $v\in N(u)$ we have that $X(v)\geq X(u)$ , then $X_{\epsilon}$ is Lipschitz. If $\epsilon<2$ and for all $v\in N(u)$ we have that $X(v)\geq X(u)+1$ , then $X_{\epsilon}$ is Lipschitz.

A direct calculation gives us that $\frac{d}{d\epsilon}\operatorname{Var}(X_{\epsilon})=\epsilon\left(1+2p(1-p))+2p(X(u)-\mathbb{E}(X))\right)$ , where $p=1/|V(G)|$ is the probability of $u$ . Thus $\frac{d}{d\epsilon}\operatorname{Var}(X_{\epsilon})>0$ when $X(u)>\mathbb{E}(X)$ , and the first part of the theorem follows. If $|V(G)|\geq 10$ , then $\frac{d}{d\epsilon}\operatorname{Var}(X_{\epsilon})\geq\epsilon-0.1$ , and so integrating from $\epsilon=0$ to $\epsilon=2$ gives a positive total change. Thus the second part of the theorem holds. ∎

One interpretation of Theorem 1.3 is that the intuition of Conjecture 1.5 is true for variance-optimal functions for all graphs. The intuition of extremal functions defined as the distance from some “origin” is half true for the log-moment function. That is, the analogue of Theorem 2.13 is true to one side of the origin, but because the log-moment function is not even we can not apply symmetry to the other side.

Theorem 2.14.

Let $t>0$ and $X$ an optimal Lipschitz function for $G$ . If $X(u)<\mathbb{E}(X)$ , then there exists a $v\in N(u)$ such that $X(v)=X(u)+1$ .

Proof.

We begin with a similar set-up as Theorem 2.13. Assume that for all $v\in N(u)$ we have that $X(v)\leq X(u)+1-\delta$ . For $0<-\epsilon<\delta$ , define $X_{\epsilon}$ to be the variable $X_{\epsilon}(w)=X(w)$ when $w\neq u$ and $X_{\epsilon}(u)=X(u)+\epsilon$ . By our assumption, $X_{\epsilon}$ is Lipschitz over $G$ . To prove the theorem, we will show that $e^{L_{X_{\epsilon}}(t)}>e^{L_{X}(t)}$ when $0<-\epsilon$ .

Let $p=1/n$ be the probability of $u$ , and so

[TABLE]

The direct calculation gives us

[TABLE]

Because $1+y\leq e^{y}$ for all $y\in\mathbb{R}$ , we have that

[TABLE]

Combine the previous two equations with our assumption $X(u)<\mathbb{E}(X)$ to see that

[TABLE]

On the domain $\epsilon<0$ we have that $\frac{d}{d\epsilon}e^{L_{X_{\epsilon}}(t)}<0$ , so the theorem follows. ∎

Following the same proof as Theorem 1.3 (but without the symmetry), we have half of the analogous result.

Corollary 2.15.

If $X$ is an optimal function and $X(u)<0$ , then $\nu_{X}-X(u)=d(u,\mathbb{O}(X))$ .

To see that “half” of the result is best possible, consider the following example. Let $G$ be the graph with vertices $\{v_{1},v_{2},v_{3},v_{4},w_{1},w_{2}\}$ and edges $\{v_{1}v_{2},v_{2}v_{3},v_{3}v_{4},w_{1}v_{2},w_{2}v_{3}\}$ . We will compare two extremal Lipschitz functions on $G$ . Let $X_{1}(v_{i})=X_{2}(v_{i})=i$ , $X_{1}(w_{1})=X_{2}(w_{1})=1$ , $X_{1}(w_{2})=4$ , and $X_{2}(w_{2})=2$ . We have that $X_{1}$ is variance-optimal, but Mathematica was able to show that $L_{X_{1}}(t)<L_{X_{2}}(t)$ when $t\geq 3$ .

We produce one more result, which will be used later. We omit the proof, as it is obvious.

Claim 2.16.

If $G\subseteq H$ , then $c(G)\geq c(H)$ and $\sigma_{G}\geq\sigma_{H}$ .

2.5 Tightness for isoperimetric inequalities

Recall that $S_{r,X}$ is the subset of vertices in $G^{n}$ defined as $\{(a_{1},a_{2},\ldots,a_{n}):\sum_{i=1}^{n}X(a_{i})\leq r\}$ for function $X:V(G)\rightarrow\mathbb{R}$ . When the function is implied, we drop the $X$ and set $S_{r}=S_{r,X}$ .

The isoperimetric problem for product graphs $G^{n}$ and a number $d$ is to identify a set $S$ of a least half of the vertices of $G^{n}$ such that $i_{G,d,n}=|\{u:d(u,S)>d\}|$ is minimized. Alon, Boppana, and Spencer [2] proved that $i_{G,d,n}$ decays exponentially as $d$ grows when $\sqrt{n}\ll d\ll n$ with a rate that relies on $c^{2}(G)$ . Let us now explore Conjecture 1.2, which is that there exists a stronger relationship between $i_{G,d,n}$ and $c^{2}(G)$ —that the extremal set $S$ can be determined by $X$ when $X$ is variance-optimal.

We will show that the conjecture is not true. The issue is that any variable $X$ is forced to represent a (potentially complex) graph to a one-dimensional space (see Theorem 2.13 and surrounding discussion). The conjecture has been seen to hold when the underlying graph has a distinctly one-dimensional topology. It even holds for graphs with multiple dimensional-topology; for example the Euclidean grid is the standard two-dimensional graph, and since the Euclidean grid is $P_{k}\square P_{k}$ the conjecture holds because it holds for $P_{k}$ . But the conjecture begins to fail when the graph has some central point, but the rest of the graph can not be neatly labeled as “up” or “down” from that central point.

To build up an understanding that counters our original intuition that the conjecture is true, let us explore the assumption $m(X)\leq 0=\mathbb{E}(X)$ that is in Theorem 1.1 but missing from Conjecture 1.2. In these early examples we only explore how $|B_{k}(S_{m(X)})|$ grows with $k$ for $n=1$ . We will do this with two examples: in the first having $m(X)<\mathbb{E}(X)$ is the “correct thing to do,” while the opposite is true in the second example.

Our first example is the unbalanced tripod $T_{k,k,2k}$ , which is one vertex attached to $3$ hairs with $k+1,k+1,2k+1$ vertices respectively. Formally, we define

[TABLE]

and

[TABLE]

We have built up enough results to determine the variance-optimal functions of this tree.

Example 2.17.

For sufficiently large $k$ , an extremal function for $T_{k,k,2k}$ is $X$ , where $X(r)=0$ , $X(x_{i})=X(y_{i})=-i$ , and $X(z_{i})=i$ . In this case, $m(x)=0$ and $\mathbb{E}(X)>0$ . There exists a permutation $\psi$ over $V(T_{k,k,2k})$ such that $\psi(S_{0,X})=S_{0,-X}$ and $\psi(B_{d}(S_{0,X}))\subsetneq B_{d}(S_{0,-X})$ .

Proof.

Because it is significantly simpler, let us begin by proving the second part and later prove that $X$ is variance-optimal. Consider the permutation $\psi$ over $V(T_{k,k,2k})$ such that $\psi^{2}=1$ , $\psi(r)=r$ , $\psi(x_{i})=z_{2i-1}$ , and $\psi(y_{i})=z_{2i}$ . By construction, $\psi(S_{0,X})=S_{0,-X}$ , $|B_{d}(S_{0,X})|=2k+1+d$ , and $|B_{d}(S_{0,-X})|=2k+1+2d$ .

Consider a Lipschitz variable $Y$ over $V(T_{k,k,2k})$ such that $\operatorname{Var}(Y)\geq\operatorname{Var}(X)$ . Note that $\mathbb{E}(X)\approx k/4$ , and so

[TABLE]

By translation invariance, let us assume that $Y(r)=0$ . We will refer to the three hairs as the $x$ -hair, the $y$ -hair, and the $z$ -hair. By (9), each hair can be broken into one or two monotone sequences. The example can thus be analyzed by exhausting through a handful of cases. The analysis will be simpler by first showing that at most $2$ and at least $1$ hair is monotone.

By definition, if a $w$ -hair is monotone, then there exists a $\delta_{w}\in\{-1,1\}$ such that $Y(w_{i})=i\delta_{w}$ . We have that $Y\in\{X,-X\}$ when all three hairs are monotone and $\delta_{x}=\delta_{y}\neq\delta_{z}$ . A simple case analysis shows that these values for $\delta_{x},\delta_{y},\delta_{z}$ maximize the variance when all three hairs are monotonic. Some hair must have a minimal element of $Y$ , a second hair must have a maximal element of $Y$ , and the third hair satisfies the assumptions of Lemma 2.10. The conclusion of Lemma 2.10 is that the third hair is monotonic.

Case 1: the $z$ -hair is monotone. By symmetry, assume $\delta_{z}=1$ and that the minimum element of $Y$ over the $x$ -hair is at most the minimum element of $Y$ over the $y$ -hair. By Lemma 2.10, we have that the $y$ -hair is monotone. If $\delta_{y}=-1$ , then Lemma 2.10 applies to the $x$ -hair, and all $3$ hairs are monotonic. This is a contradiction, so $\delta_{y}=1$ and the $x$ -hair is not monotonic. By considering monotone sequences as the extreme values of $Y(x_{i})$ , we see that $k/2<\mathbb{E}(Y)<3k/4$ .

If the $x$ -hair is not monotone, then Lemma 2.10 does not apply and for some $i$ we have $Y(x_{i})<0$ . Also, by Theorem 2.13, there exists a $j$ with $Y(x_{j})\geq\mathbb{E}(Y)>k/2$ . For both of these facts to be true, it must be that there exists an $\ell$ such that $Y(x_{i})=-i$ for $1\leq i\leq\ell$ and $Y(x_{i})=-2\ell+i$ for $i\geq\ell$ . Moreover, $\ell<k/4$ . But then for all $w\in V(T_{k,k,2k})-\{y_{5k/4+1},\ldots,y_{2k}\}$ we have that $|Y(w)-\mathbb{E}(Y)|\leq 3k/4$ . So we see that

[TABLE]

Case 2: the $z$ -hair is not monotone. Lemma 2.10 does not apply to the $z$ -hair, so the $z$ -hair contains a maximum or minimum element of $Y$ . By symmetry, let us assume that $Y(z_{\ell})$ is the maximum value of $Y$ for some value of $\ell$ . Theorem 2.13 implies that there exists an $\ell^{\prime}>\ell$ such that $Y(z_{\ell^{\prime}})<\mathbb{E}(Y)$ . Note that $\mathbb{E}(Z)\leq 3k/4$ for any Lipschitz variable $Z$ such that $Z(r)=0$ , and therefore $\ell\leq 11k/8$ .

Some hair is monotone, so by symmetry the $x$ -hair is monotone.

Case 2.A: $\delta_{x}=-1$ . Lemma 2.10 implies that the $y$ -hair is monotone. If $\delta_{y}=1$ , then the constraint that $Y(z_{\ell^{\prime}})<\mathbb{E}(Y)$ implies that $\ell/k\leq 1+(\sqrt{10}-3)\leq 1.163$ and $\mathbb{E}(Y)\leq 2(\sqrt{10}-3)k\leq 0.325k$ . On the other hand, $\delta_{y}=1$ implies $\mathbb{E}(Y)\geq k/4$ . But then for all $w\in V(T_{k,k,2k})-\{x_{k/2+1},\ldots,x_{k}\}$ we have that $|Y(w)-\mathbb{E}(Y)|\leq 3k/4$ . Using a calculation similar to the end of case 1, we see that $\operatorname{Var}(Y)<\operatorname{Var}(X)$ .

So assume $\delta_{y}=\delta_{x}=-1$ . The constraint that $Y(z_{\ell^{\prime}})<\mathbb{E}(Y)$ implies that $\ell<k$ and $\mathbb{E}(Y)<0$ . But then

[TABLE]

Case 2.B: $\delta_{x}=1$ . It follows that $k/4\leq\mathbb{E}(Y)\leq 3k/4$ . So for all $i$ , we have that $|Y(x_{i})-\mathbb{E}(y)|\leq 3k/4$ . If $\mathbb{E}(Y)\geq k/2$ , then for all $i$ we have that $|Y(z_{i})-\mathbb{E}(Y)|\leq 3k/4$ . If $\mathbb{E}(Y)\leq k/2$ , then for all $i$ we have that $|Y(y_{i})-\mathbb{E}(Y)|\leq 3k/4$ . So we arrive at a contradiction similar to the end of case 1. ∎

Our second example is a slight modification to the unbalanced tripod.

Example 2.18.

Fix a large odd $k$ . Let $G^{\prime}$ be a single vertex with $k+3$ hairs: three of them have $k$ vertices and $k$ of them have $1$ vertex. As before, let $r$ be the unique vertex with degree greater than $2$ and let the three hairs have vertices $x_{i},y_{i},z_{i}$ ; and now denote the vertices in the short hairs as $w_{1},\ldots,w_{k}$ . Let $G$ be $G^{\prime}$ plus edges $w_{1}z_{2},w_{1}w_{2},w_{1}w_{3}$ .

The variable $X$ such that $\operatorname{Var}(X)=c^{2}(G^{\prime})$ satisfies $X(r)=0$ , $X(x_{i})=X(y_{i})=i$ , $X(z_{i})=-i$ , and $X(w_{i})=-1$ is variance-optimal. As before, $m(x)=0$ and $\mathbb{E}(X)>0$ . We have that $|S_{-2,X}|=|S_{-(k+3)/2,-X}|$ . However, there exists a permutation $\psi$ of the vertex set such that $\psi(B_{d}(S_{-2,X}))\supsetneq B_{d}(S_{-(k+3)/2,-X})$ .

Proof.

First, let us prove that $X$ is variance optimal over $G^{\prime}$ . By translation-invariance, let us assume $X(r)=0$ . As in Example 2.17, one hair may have a maximal element of $X$ , a second hair may have a minimal element of $X$ , and Lemma 2.10 will apply to the third. Because the three long hairs have equal length, if they are monotone then they contain an extremal value. So Lemma 2.10 will apply to the $x$ , $y$ , and $z$ -hairs.

By symmetry, we can assume $\delta_{x}=\delta_{y}=1$ (by notation from Example 2.17), and so $\mathbb{E}(X)\geq k/8>1$ . Theorem 2.13 says that $X(w_{i})=-1$ for all $i$ . We then have only two cases to check: when $\delta_{z}=1$ and when $\delta_{z}=-1$ , and a direct calculation gives that $X$ is variance-optimal when $\delta_{z}=-1$ .

Now notice that $G^{\prime}\subseteq G$ , so by Claim 2.16 we have that $c(G^{\prime})\geq c(G)$ . But $X$ is Lipschitz over $G$ , so $X$ is variance optimal over $G$ .

Consider the permutation $\psi$ over $V(G)$ such that $\psi^{2}=1$ , $\psi(r)=r$ ,

•

$\psi(x_{k-i})=z_{k-2i}$ for $k-i\geq(k+1)/2$ (specifically note that $\psi(x_{(k+1)/2})=z_{1}$ ),

•

$\psi(y_{k-i})=z_{k-2i-1}$ for $k-i\geq(k+3)/2$ ,

•

$\psi(y_{(k+1)/2})=w_{1}$ ,

•

$\psi(y_{(x-1)/2})=w_{2}$ , $\psi(y_{(k-1)/2})=w_{3}$ ,

•

$\psi(x_{i})=w_{3+i}$ for $i\leq(k-3)/2$ , and

•

$\psi(y_{i})=w_{(k+3)/2+i}$ for $i\leq(k-3)/2$ .

By construction $S_{-2,X}=\{z_{2},z_{3},\ldots,z_{k}\}$ and $S_{-(k+3)/2,-X}=\{x_{(k+3)/2},\ldots,x_{k},y_{(k+3)/2},\ldots,y_{k}\}$ . Both sets have $k-1$ vertices. Moreover, we see that

•

$B_{1}(S_{-2,X})=S_{-2,X}\cup\{w_{1},z_{1}\}$ ,

•

$B_{2}(S_{-2,X})=B_{1}(S_{-2,X})\cup\{w_{2},r\}$ ,

•

for $d\geq 3$ , we have $V(G)-B_{d}(S_{-2,X})=\{x_{d-2},\ldots,x_{k},y_{d-2},\ldots,y_{k}\}$ .

•

for $d<(k+3)/2$ we have $B_{d}(S_{-(k+3)/2,-X})=S_{-(k+3)/2,-X})\cup\{x_{(k+3-2d)/2},\ldots,x_{(k+1)/2},y_{(k+3-2d)/2},\ldots,y_{(k+1)/2}\}$ ,

•

for $d=(k+3)/2$ we have $B_{d}(S_{-(k+3)/2,-X})=B_{d-1}(S_{-(k+3)/2,-X})\cup\{r\}$ , and

•

for $d>(k+3)/2$ we have $V(G)-B_{d}(S_{-(k+3)/2,-X})=\{z_{d-(k+1)/2}\ldots,z_{k}\}$ .

So we have that $\psi(B_{d}(S_{-2,X}))\supseteq B_{d}(S_{-(k+3)/2,-X})$ for all $d$ , and strict containment for $d>2$ . ∎

Our third example is similar to Examples 2.17 and 2.18; it is a tree whose main tomography is two long paths whose endpoints are attached to a central vertex. Similar to before, we will establish a permutation $\psi$ on our counterexample graph $G$ with variance-optimal function $X$ such that $\psi(B_{d}(S_{r,X}))\subsetneq B_{d}(S_{r,X})$ . The improvement of this example over Examples 2.17 and 2.18 is that this relation will hold for all $r$ and $d$ , which implies that this relation holds when $\psi$ is tensored into a higher dimension as a permutation $\psi^{n}$ over $G^{n}$ .

One distinction between $G$ and Examples 2.17 and 2.18 is that the two long paths are not hairs, but instead have many hairs of length $h$ attached to them. In Theorem 2.19 we only present $h=1$ , but the result holds with greater discrepancy between $V(G^{n})-B_{d}(S_{r,X})$ and $i_{G,d,n}$ for $h\leq O(1)\ll k$ , where $k$ is the length of the paths.

Theorem 2.19.

Conjecture 1.2 is not true.

Proof.

We consider the even-length caterpillar with one leg per body segment. Formally, this graph is a path $u_{1},u_{2},\ldots,u_{2k}$ and a set of leaves $w_{1},\ldots,w_{2k}$ such that $N(w_{i})=\{u_{i}\}$ . When drawn, this graph resembles a hair comb. Let $G$ denote this graph.

Suppose we have some Lipschitz variable $X$ on $G$ . Let $\pi$ be a permutation on $\{1,\ldots,2k\}$ such that $X(u_{\pi(i)})\leq X(u_{\pi(i+1)})$ . Let $X_{\pi}$ be such that $X_{\pi}(u_{i})=X(u_{\pi(i)})$ and $X_{\pi}(w_{i})=X(w_{\pi(i)})$ . Note that $X_{\pi}$ is also Lipschitz. Moreover, if $X$ is variance-optimal, then for every $i$ we have that $|X(u_{i})-X(u_{i+1})|=|X_{\pi}(u_{i})-X_{\pi}(u_{i+1})|=1$ . Because $X$ is variance-optimal and translation invariant, we may then assume that $X(u_{i})=i$ . Theorem 2.13 implies that $X(w_{i})=i-1$ if $i\leq k$ and $X(w_{i})=i+1$ otherwise.

Let us write out in full detail the ordering on $V(G)$ imposed by $X$ . We do this by levels, where level $i$ of function $Z:V(G)\rightarrow\mathbb{R}$ is the vertex set $Z^{-1}(i)$ . The levels of $X$ are from [math] to $2k+1$ and they compose of

•

level [math] is $\{w_{1}\}$ ,

•

level $i$ for $1\leq i\leq k-1$ is $\{w_{i+1},u_{i}\}$ ,

•

level $k$ is $\{u_{k}\}$ ,

•

level $k+1$ is $\{u_{k+1}\}$ ,

•

level $j$ for $k+2\leq j\leq 2k$ is $\{w_{j-1},u_{j}\}$ , and

•

level $2k+1$ is $\{w_{2k}\}$ .

Let us propose a different ordering of $V(G)$ . Let us call this ordering $Y:V(G)\rightarrow\mathbb{R}$ and the levels of $Y$ are composed of

•

levels [math] to $k$ are the same as for $X$ ,

•

level $k+1$ is $\{w_{k+1}\}$ ,

•

level $j$ for $k+2\leq j\leq 2k$ is vertices $\{w_{j},u_{j-1}\}$ , and

•

level $2k+1$ is $\{u_{2k}\}$ .

The levels have the same number of vertices for $X$ and for $Y$ .

We claim that $Y$ is a better ordering than $X$ . Unfortunately $Y$ is not a $1$ -Lipschitz function, as $Y(u_{k})=k$ and $Y(u_{k+1})=k+2$ . However, the set $S_{r,Y}$ is still a well-defined object. We also have that $\mathbb{E}(X)=\mathbb{E}(Y)=m(X)=m(Y)=(2k+1)/2$ .

If $r\leq(2k+1)/2$ , then $S_{r,Y}=S_{r,X}$ as the levels from [math] to $k$ are defined to be the same. If $r\geq k+1$ , then $|B_{d}(S_{r,X})|=|S_{r,X}|+2d$ . If $r\geq k+1$ and $d\in\{1,2\}$ , then $|B_{d}(S_{r,Y})|=|S_{r,Y}|+d$ . If $r\geq k+1$ and $d\geq 3$ , then $|B_{d}(S_{r,Y})|=|S_{r,Y}|+2d-2$ .

There exists a permutation $\psi$ over $V(G)$ such that $X(u)=Y(\psi(u))$ and $\psi(B_{d}(S_{r,X}))\supseteq B_{d}(S_{r,Y})$ with strict containment when $d>0$ and $r\geq k+2$ . So if $\psi^{n}$ is a permutation ov $V(G^{n})$ such that $\psi$ is applied to each coordinate, it follows that $\psi^{n}(B_{d}(S_{r,X}))\supseteq B_{d}(S_{r,Y})$ , also with strict containment when $d>0$ and $r\geq k+2$ (recall that $m(X^{n})=n(2k+1)/2$ ). ∎

3 Discrete Positive Curvature

3.1 Convex sets and iterated midpoints

The notion of “convex” is undefined for discrete spaces, but Ollivier and Villani define it for this context to be the property that $\widehat{m}(S,S)\subseteq S$ . We will first give an example of convex sets $A,B$ in the hypercube $\mathbb{H}_{12}$ where $m_{1/2}(A,m_{1/2}(A,B))$ is much larger than $m_{1/4}(A,B)$ . It will be clear how this example generalizes to larger dimensions. Then we will prove that our examples of $A,B$ are typical of all convex sets in the hypercube.

This section will be easier if we use the notation of the Boolean lattice, which is equivalent to the hypercube. That is, the points of $\mathbb{H}_{d}$ are represented as the subsets of $\{1,2,\ldots,d\}$ and the distance between two points is the order of their symmetric difference.

Example 3.1.

Let $A,B\subset\mathbb{H}_{12}$ , where $A=\{\alpha:\alpha\subseteq\{1,2,3,4\}\}$ and $B=\{\beta:\beta\supseteq\{1,2,\ldots,8\}\}$ . Each set $A$ and $B$ is a subspace of $H_{12}$ that is isometric to $H_{4}$ and therefore is convex. We can directly calculate that if $\gamma\in m_{1/2}(A,B)$ , then $4\leq|\gamma|\leq 8$ . Also, if $\gamma\in m_{1/4}(A,B)$ , then $2\leq|\gamma|\leq 6$ . Now let us consider $m_{1/2}(A,m_{1/2}(A,B))$ . First off, the set $\phi=\{7,8,\ldots,12\}$ is a midpoint of $\emptyset\in A$ and $\{1,2,\ldots,12\}\in B$ and thus is in $m_{1/2}(A,B)$ . Now notice that $\zeta=\{1,2,3,4,8,9,10,11,12\}$ is a midpoint of $\{1,2,3,4\}\in A$ and $\phi$ and thus is in $m_{1/2}(A,m_{1/2}(A,B))$ . But $|\zeta|=9$ , which is too large to be in $m_{1/2}(A,B)$ , much less $m_{1/4}(A,B)$ !

To fully refute Ollivier and Villani’s suggestion for approximating $m_{1/4}$ , we need to show that this example is typical–as they only suggest that this method will “probably” work. For this purpose, we show that every convex closure of a set of points in the hypercube is the embedding of a smaller dimensional hypercube.

Theorem 3.2.

If $S$ is a subset of the Boolean lattice such that $\widehat{m}(S,S)\subseteq S$ , then there exists sets $\alpha,\beta$ such that

[TABLE]

Proof.

First, it should be clear that $S$ has a unique maximal element, as otherwise a midpoint between the sets will have more elements. A symmetric argument gives a unique minimal element. Iterated applications of the midpoint argument gives every set in between the maximal and minimal element. ∎

We found the following consequences of this result to be interesting, as it illustrates how unusual the behavior can be for discrete spaces that appear nice and simple.

Corollary 3.3.

The convex closure of any non-trivial ball in a hypercube is the whole space.

Proof.

By symmetry, let us assume that the center of the ball is $\emptyset$ . Because the ball is non-trivial (in other words, the radius is positive), the ball contains the element $\{i\}$ for all $i\in\{1,\ldots,n\}$ as it is the minimum distance from $\emptyset$ . By Theorem 3.2, the convex closure of the ball contains $\emptyset$ and $\cup_{i}\{i\}$ and everything in-between. ∎

Corollary 3.4.

If $A,B$ are nonempty sets of vertices in the hypercube and $C$ is the convex closure of $\widehat{m}(A,B)$ , then $A\cup B\subseteq C$ .

Proof.

For each $i\in\{1,\ldots,d\}$ , there exists an automorphism $\phi_{i}$ of $\mathbb{H}_{d}$ such that $\phi_{i}(\alpha)=\alpha\cup\{i\}$ when $i\notin\alpha$ and $\phi_{i}(\alpha)=\alpha\setminus\{i\}$ otherwise. Let $\alpha$ be an arbitrary fixed element of $A$ , and pick some $\beta\in B$ . Let $\phi=\prod_{i\notin\alpha}\phi_{i}$ , and let us consider the space $\mathbb{H}_{d}$ after $\phi$ has been applied. By construction, $\phi(\alpha)=\{1,\ldots,n\}$ . For each $j$ , there exists a $\gamma\in\widehat{m}(\alpha,\beta)$ such that $j\in\gamma$ . So by Theorem 3.2, the convex closure of $\widehat{m}(\alpha,\beta)$ contains $\alpha$ . The corollary follows from symmetry. ∎

3.2 The $\ell_{0}$ , $\ell_{1}$ , and $\ell_{\infty}$ metric

Let us note that there are natural constructions of the $\ell_{0}$ , $\ell_{1}$ , and $\ell_{\infty}$ metrics in graph theory. Suppose $G$ is the product of spaces $H_{1},H_{2},\ldots,H_{d}$ . The $\ell_{1}$ metric is denoted $H_{1}\square H_{2}\square\cdots\square H_{d}$ , as mentioned above. The $\ell_{0}$ metric is isometric to $K_{|H_{1}|}\square K_{|H_{2}|}\square\cdots\square K_{|H_{d}|}$ . Finally, the $\ell_{\infty}$ metric is denoted by the tensor product, which is $H_{1}\boxtimes H_{2}\boxtimes\cdots\boxtimes H_{d}$ .

Theorem 3.5.

Suppose $S,T\subseteq G=K_{r_{1}}\square K_{r_{2}}\square\cdots\square K_{r_{d}}$ such that $d_{*}(S,T)\geq\delta d$ . Let $\rho$ be such that $|1/2-\rho|<\epsilon$ and $C=\delta^{2}-16\ln(2)\epsilon>0$ . Under these conditions, for large $d$ we have that

[TABLE]

Proof.

We use shorthand $(A,B)_{r}=\{(a,b):a\in A,b\in B,d(a,b)=r\}$ . We claim that for each $d\geq r\geq\delta d$ that

[TABLE]

The claim will prove the theorem, as

[TABLE]

for $0<r\leq d$ by the definition of $C$ .

We define $C_{\ell_{1},\ldots,\ell_{k}}^{(r)}=\{A\subseteq\{0,1\}^{r}:|A|=\ell_{i},1\leq i\leq k\}$ as before. We will define a map

[TABLE]

Moreover, we will demonstrate that this map satisfies for any $(m^{(1)},m^{(2)})\in(\widehat{m}_{\rho}(S,T),\widehat{m}_{\rho}(S,T))_{r}$ with $d\geq r\geq\delta d$ , the property

[TABLE]

which proves the claim.

For $a\in G=K_{r_{1}}\square K_{r_{2}}\square\cdots\square K_{r_{d}}$ , we will use notation $a=(a_{1},a_{2},\ldots,a_{j},\ldots,a_{d})$ to refer to individual coordinates. For a pair $(s,t)\in(S,T)_{r}$ , let $i_{1}<\cdots<i_{r}$ be the coordinates where $s_{i}\neq t_{i}$ . Let $\pi\in C^{(r)}_{\lfloor\rho r\rfloor,\lceil(1-\rho)r\rceil}$ and $\phi((s,t),\pi)=(m^{(1)},m^{(2)})$ . We define

[TABLE]

It is clear that

•

$m^{(1)}$ and $m^{(2)}$ are weighted midpoints between $s$ and $t$ ,

•

that $d(s,m^{(2)})=|\pi|\in\{\lfloor\rho r\rfloor,\lceil(1-\rho)r\rceil\}$ , and

•

$d(s,m^{(1)})=r-|\pi|\in\{\lfloor\rho r\rfloor,\lceil(1-\rho)r\rceil\}$ .

Thus, $m^{(1)},m^{(2)}\in\widehat{m}_{\rho}(S,T)$ . Moreover, $d(m^{(1)},m^{(2)})=r$ , and so $(m^{(1)},m^{(2)})\in(\widehat{m}_{\rho}(S,T),\widehat{m}_{\rho}(S,T))_{r}$ .

All that remains is to prove that for any fixed $(m^{(1)},m^{(2)})\in(\widehat{m}_{\rho}(S,T),\widehat{m}_{\rho}(S,T))_{r}$ , we have that $\phi^{-1}(m^{(1)},m^{(2)})$ is not too large. Let $\phi^{\prime}$ be defined in the same way as $\phi$ , but on the extended domain $(G,G)_{r}\times C^{(r)}_{\lfloor\rho r\rfloor,\lceil(1-\rho)r\rceil}$ . By the definition of $\phi$ , it is a simple calculation to see that if $\phi((s,t),\pi)=(m^{(1)},m^{(2)})$ , then $\phi^{\prime}((m^{(1)},m^{(2)}),\pi)=(s,t)$ . So for any fixed $\pi$ , we have that $|\{(s,t):\phi((s,t),\pi)=(m^{(1)},m^{(2)})\}|\leq 1$ , and it equals $1$ if and only if $\phi^{\prime}((m^{(1)},m^{(2)}),\pi)\in S\times T$ . The claim will then follow when we demonstrate next that for

[TABLE]

values of $\pi$ , we have that $\phi^{\prime}((m^{(1)},m^{(2)}),\pi)\notin S\times T$ .

Recall that we are assuming $(m^{(1)},m^{(2)})$ is fixed. We define sets

[TABLE]

and

[TABLE]

Suppose $\phi^{\prime}((m^{(1)},m^{(2)}),\pi)=(a,b)$ and $\phi^{\prime}((m^{(1)},m^{(2)}),\pi_{*})=(a_{*},b_{*})$ . By definition of $\phi^{\prime}$ , it should be clear that $d(\pi,\pi_{*})=d(a,a_{*})=d(b,b_{*})$ . Therefore $\min_{\pi\in S^{\prime},\pi_{*}\in T^{\prime}}d(\pi,\pi^{\prime})\geq d_{*}(S,T)=\delta d\geq\delta r$ . The proof then follows from applying the second part of Theorem 2.7. ∎

The $\ell_{0}$ , $\ell_{1}$ , and $\ell_{\infty}$ metrics are special in that $m(\{a\},\{b\})$ is not unique in standard space. The positive curvature in $\ell_{0}$ metrics can entirely be attributed to the fact that $m(\{a\},\{b\})$ grows exponentially with $d(a,b)$ . But what about $\ell_{1}$ and $\ell_{\infty}$ ? The problem with these metrics is that $m(\{a\},\{b\})$ might be unique. Suppose our space $G$ is the $n$ -dimensional product of subspace $H$ , and let points in $G$ be denoted by tuples $x=(x_{1},\ldots,x_{d})$ , $y=(y_{1},\ldots,y_{d})$ . If $G$ is equipped with $\ell_{1}$ metric, then $m(\{x\},\{y\})$ may be unique (depending on $H$ ) if $x_{i}=y_{i}$ for all $i\geq 2$ . If $G$ is equipped with $\ell_{\infty}$ metric, then $m(\{x\},\{y\})$ may be unique (depending on $H$ ) if $x_{i}=x_{j}$ and $y_{i}=y_{j}$ for all $i,j$ .

Essentially the issue with the $\ell_{1}$ and $\ell_{\infty}$ metrics is that there exists a convex embedding of $H$ in $G$ . This is similar to Ruzsa [34] and Gardner and Gronchi’s [21] problems for establishing a discrete analogue of the Brunn-Minkowski inequality: there is a degenerate case where the sets lie in a smaller dimension. This leads to several natural open questions.

Open Question 3.6.

Suppose $G$ is the $d$ -dimensional product of space $H$ equipped with the $\ell_{1}$ metric.

If we force the sets $A$ and $B$ to have dimension $d$ in a manner similar to Gardner and Gronchi’s work, how do the sets of midpoints grow with the distance between $A$ and $B$ ? (The same question can be asked in the $\ell_{\infty}$ metric). 2. 2.

If the distance between $A$ and $B$ is asymptotically bigger than the diameter of $H$ , do we see an exponential growth in the number of midpoints between sets similar to the growth when equipped with $\ell_{0}$ ?

There are several initial statements that can be said of the questions in 3.6. First, the $d$ -dimensional hypercube embeds into $G$ for any $H$ when equipped with the $\ell_{1}$ metric, and this establishes a best-case-possible because Ollivier and Villani’s result is known to be asymptotically tight. Second, if we are equipped with the $\ell_{1}$ metric, then a proof similar to that of Theorem 3.5 will show that the set of points within distance $d_{H}$ of the midpoints of $A$ and $B$ is at least $\sqrt{|A||B|}e^{\frac{1}{16d}\left(\frac{d(A,B)}{d_{H}}\right)^{2}}$ , where $d_{H}$ is the diameter of $H$ . This is because $G$ has, with error term $d_{H}$ , a “rough geometry” equivalent to that of metric $\ell_{0}$ . However, if $H$ has strong negative curvature, then it is not clear that the midpoints themselves will be large. Finally, note that the second question is similar to the first—with the change that we are requiring the set $A-B$ to have large dimension instead of $A$ and $B$ themselves, which may be sufficient due to the non-uniqueness of midpoints. A similar question involving the $\ell_{\infty}$ metric will require the opposite assumption: that $A,B$ coexist in a small dimensional subspace.

3.3 Catalog of displacement convexity definitions

There are several different versions of displacement convexity. The general intuition is that when supply $M_{A}:V(G)\rightarrow\mathbb{R}_{\geq 0}$ is optimally routed to demand $M_{B}:V(G)\rightarrow\mathbb{R}_{\geq 0}$ across a time span $t\in[a,b]$ , then the mid-transit goods in a positively curved space at time $t$ should be more “spread out” than the convex combination of $\frac{t-a}{b-a}$ of the spread of $M_{A}$ and $\frac{b-t}{b-a}$ of the spread of $M_{B}$ .

Formally, the functions $M_{A},M_{B}$ are normalized to probability spaces $\mu_{A},\mu_{B}$ . A function $\tau:V(G)\times V(G)\rightarrow\mathbb{R}_{\geq 0}$ is a transportation of the goods from the supply to the demand if $\mu_{A}(\alpha)=\sum_{\beta}\tau(\alpha,\beta)$ for all $\alpha$ and $\mu_{B}(\beta)=\sum_{\alpha}\tau(\alpha,\beta)$ for all $\beta$ . The Wasserstein cost of a transportation $\tau$ is $W^{1}(\tau)=\sum_{\alpha,\beta}\tau(\alpha,\beta)d(\alpha,\beta)$ , and the Wassertsein cost of order two is $W^{2}(\tau)=\sum_{\alpha,\beta}\tau(\alpha,\beta)d(\alpha,\beta)^{2}$ . The Wasserstein distance between $\mu_{A}$ and $\mu_{B}$ is $W(\mu_{A},\mu_{B})=\inf_{\tau}W^{1}(\tau)$ . The generalization where $M_{A},M_{B}$ are not normalized is also known as the Earth Mover’s Distance. An optimal transport $\tau$ minimizes $W^{2}$ . For finite spaces such a minimum clearly exists.

For each $\alpha,\beta\in A\times B$ , we place a probability distribution $Q_{\alpha,\beta}$ on the geodesics from $\alpha$ to $\beta$ . For a geodesic $P$ that starts at $\alpha$ and ends at $\beta$ , let $P(t)$ for $0\leq t\leq 1$ be the point on $P$ whose distance from $\alpha$ is $td(\alpha,\beta)$ . For a fixed optimal transport $\tau$ and probability distributions $Q_{\alpha,\beta}$ , we define probability distribution $\mu_{t}$ for $0\leq t\leq 1$ to be

[TABLE]

Distance interpolation is an optimal transportation $\tau$ combined with a uniform distribution applied to each $Q_{\alpha,\beta}$ . For the discrete analogue, a point $P(t)$ would be $P(\frac{\lfloor td(\alpha,\beta)\rfloor}{d(\alpha,\beta)})$ and $P(\frac{\lceil td(\alpha,\beta)\rceil}{d(\alpha,\beta)})$ .

For probability space $\mu$ , let $H(\mu)$ denote its entropy. The formal notion of mid-transit goods being more spread out than the supply and demand is that

[TABLE]

where $K$ is a nonnegative number that represents the strength of the curvature of the space $G$ . The distinction between the different versions of displacement convexity come from the distinction between the words “any” and “every” when it comes to multiple optimal transportations or multiple geodesics between points (events that occur rarely in geometric geodesic spaces, but which played a large role in Section 3.2). We outline several definitions below.

•

(Strong displacement convexity) For any optimal transportation $\tau$ and geodesic choices $Q_{\alpha,\beta}$ , (10) holds.

•

(Sort-of-strong displacement convexity) (10) holds for distance interpolation on any optimal transportation.

•

(Sort-of-weak displacement convexity) (10) holds for distance interpolation on some optimal transportation.

•

(Weak displacement convexity) There exists optimal transportation $\tau$ and geodesic choices $Q_{\alpha,\beta}$ , possibly dependent on $\mu_{A},\mu_{B}$ , such that (10) holds.

Strong displacement convexity is presented as the “normal” version in [32]. That the inequality should hold for any transportation or geodesic is also the condition in [20], although they are not working with displacement convexity. The conditions of sort-of-strong curvature are presented as the definition of distance interpolation in Theorem 7.21 of [37], conditions that imply uniqueness of geodesics and optimal transportations are used later (such as Definition 16.5, which is set up by chapters 9 and 10). Gozlan, Roberto, Samson, and Tetali [22] work with sort-of-weak displacement convexity, but they use transportations that minimize $W^{1}$ instead of $W^{2}$ and “midpoints” take mass Gaussian distributed across a geodesic rather than being a point mass. Weak displacement convexity is presented in chapter 29 of [37], and in [28], although it is just called “displacement convexity” except in the bottom of page 906. Weak displacement convexity is also used by Bonciocat and Sturm (see equation (2.1) of [9]) in their work on approximate midpoints.

If we do not specify a type of displacement convexity, then the statement holds for all four forms of displacement convexity. In the following sections we will prove statements about specific types of displacement convexity, but first we will prove several statements that hold for general displacement convexity. To do so, we modify our definition of midpoints. For vertices $a,b$ , let the midpoints between them $\widetilde{m}(a,b)$ be as follows:

(A) if $d(a,b)$ is even, then $\widetilde{m}(a,b)=\{u\in V(G):2d(u,a)=2d(u,b)=d(a,b)\}$ , and

(B) if $d(a,b)$ is odd, then $\widetilde{m}(a,b)=\{(u,v)\in E(G):2d(u,a)+1=2d(v,b)+1=d(a,b)\}$ .

We use several ideas from continuous transportation theory in our work. Some of those ideas translate into similar techniques in the discrete setting; while other ideas translate into performing the opposite technique. Specifically, we will use the relationship between cyclical monotonicity (Definition 5.1 of [37]) and optimal transportations as before. On the other hand, we reverse the Monge-Mather shortening principle (chapter 8 of [37]), because after Lemma 3.10 we will only be interested in transportations whose entire support is one class in the transitive closure of the relation $(a,b)~{}(a^{\prime},b^{\prime})$ when some $(a,b)$ -geodesic intersects some $(a^{\prime},b^{\prime})$ -geodesic.

A transportation $\tau$ is cyclically monotone if for every sequence of pairs $(a_{1},b_{1}),\ldots,(a_{k},b_{k})$ we have that $\sum_{i=1}^{k}d(a_{i},b_{i})^{2}\leq d(a_{1},b_{k})^{2}+\sum_{i=1}^{k-1}d(a_{i+1},b_{i})^{2}$ . It is easy to confirm that a transportation is optimal if and only if it is cyclically monotone, as this is the same technique as finding alternating cycles when looking for maximum weight matchings in a bipartite graph.

Recall that in the proof to Theorem 3.5 we split $A\times B$ to classes $(A\times B)_{r}=\{(a,b):d(a,b)=r\}$ . We will do this again using the following lemmas.

Lemma 3.7.

If $\tau$ is an optimal transportation for probability functions $\mu_{A},\mu_{B}$ over graph $G$ , $\tau(a_{1},b_{1})\tau(a_{2},b_{2})>0$ , and $\widetilde{m}(a_{1},b_{1})\cap\widetilde{m}(a_{1},b_{1})\neq\emptyset$ , then $d(a_{1},b_{1})=d(a_{2},b_{2})$ .

Proof.

By contradiction, assume that $d(a_{1},b_{1})<d(a_{2},b_{2})$ . The proof to the lemma is similar when $d(a_{1},b_{1})$ is odd or even, so we will assume $d(a_{1},b_{1})$ is odd and allow the reader to handle the even case. Let $d(a_{1},b_{1})=2k+1$ and $d(a_{2},b_{2})=2\ell+1\geq 2k+3$ . Let $(u,v)\in\widetilde{m}(a_{1},b_{1})\cap\widetilde{m}(a_{1},b_{1})$ , and let $d(a_{1},u)=k$ , $d(b_{1},v)=k$ . Repeated use of the triangle inequality implies $d(a_{1},b_{2}),d(a_{2},b_{1})\leq k+\ell+1$ . We can directly calculate that $d(a_{1},b_{1})^{2}+d(a_{2},b_{2})^{2}-d(a_{1},b_{2})^{2}-d(a_{2},b_{1})^{2}\geq 2(k-\ell)^{2}>0$ , which contradicts that $\tau$ is cyclically monotone. ∎

We now give a formal definition for a partition of a transportation. Lemma 3.7 implies that each distinct set of distances involved in a transportation can be used to construct a partition.

Definition 3.8.

Let $\tau$ be a transportation from $\mu_{A}$ to $\mu_{B}$ . Let $E_{1},E_{2}$ be nonempty sets such that $E_{1}\cup E_{2}=\{(a,b):\tau(a,b)>0\}$ and if $(a_{1},b_{1})\in E_{1}$ , $(a_{1},b_{2})\in E_{2}$ , then $\widetilde{m}(a_{1},b_{1})\cap\widetilde{m}(a_{2},b_{2})=\emptyset$ . Then $E_{1},E_{2}$ form a partition of $\tau$ . Let $\eta_{1}=\sum_{(a,b)\in E_{1}}\tau(a,b)$ , define probability functions $\mu_{A,1}(x)=\eta_{1}^{-1}\sum_{(x,b)\in E_{1}}\tau(x,b)$ and $\mu_{B,1}(y)=\eta_{1}^{-1}\sum_{(a,y)\in E_{1}}\tau(a,y)$ , and let us define transportation $\tau_{1}:\mu_{A,1}\rightarrow\mu_{B,1}$ as $\tau(a,b)=\eta_{1}^{-1}\tau(a,b)$ if $(a,b)\in E_{1}$ and $\tau(a,b)=0$ otherwise. Define $\eta_{2},\mu_{A,2},\mu_{B,2},\tau_{2}$ similarly.

We remark that $\tau=\eta_{1}\tau_{1}+\eta_{2}\tau_{2}$ , and so for all $k$ we have that $W^{k}(\tau)=\eta_{1}W^{k}(\tau_{1})+\eta_{2}W^{k}(\tau_{2})$ . The following claim is then obvious, and we omit the proof.

Claim 3.9.

We use the notation of Definition 3.8. If $\tau$ is an optimal transportation, then so are $\tau_{1}$ and $\tau_{2}$ . If $\tau$ is an optimal transportation and $\tau_{1}^{\prime}:\mu_{A,1}\rightarrow\mu_{B,1}$ , $\tau_{2}^{\prime}:\mu_{A,2}\rightarrow\mu_{B,2}$ are optimal transportations, then $\tau^{\prime}=\eta_{1}\tau_{1}^{\prime}+\eta_{2}\tau_{2}^{\prime}$ is also an optimal transportation from $\mu_{A}$ to $\mu_{B}$ .

In the next lemma we show that it suffices to prove curvature exists among transportations without partitions in order to prove that curvature exists in general. We follow this with another lemma, which describes transportations without partitions.

Lemma 3.10.

Let $\tau$ be an optimal transportation from $\mu_{A}$ to $\mu_{B}$ with a partition as defined in Definition 3.8. Let $\mu_{C},\mu_{C,1},\mu_{C,2}$ be the probability measures for the midpoints using transportations $\tau,\tau_{1},\tau_{2}$ , respectively. If for $i\in\{1,2\}$ there exists fixed values $k,C$ and convex function $f$ such that

[TABLE]

then

[TABLE]

Proof.

By the definition of of partition, the support of $\mu_{C,1}$ and the support of $\mu_{C,2}$ is disjoint, and therefore $\mu_{C}=\eta_{1}\mu_{C,1}+\eta_{2}\mu_{C,2}$ . By the formula for entropy, this implies that $S(\mu_{C})=\eta_{1}(S(\mu_{C,1})-\ln(\eta_{1}))+\eta_{2}(S(\mu_{C,2})-\ln(\eta_{2}))$ . By the convexity of entropy, the equality for $\mu_{C}$ is an upper bound: $S(\mu_{A})\leq\eta_{1}(S(\mu_{A,1})-\ln(\eta_{1}))+\eta_{2}(S(\mu_{A,2})-\ln(\eta_{2}))$ and $S(\mu_{B})\leq\eta_{1}(S(\mu_{B,1})-\ln(\eta_{1}))+\eta_{2}(S(\mu_{B,2})-\ln(\eta_{2}))$ . So we are left with

[TABLE]

and thus the lemma follows from the convexity of $f$ and Claim 3.9. ∎

Lemma 3.11.

*Let $\tau$ be an optimal transportation from $\mu_{A}$ to $\mu_{B}$ that does not have a partition. Let $A$ be the support of $\mu_{A}$ and $B$ be the support of $\mu_{B}$ .

(1) There exists a constant $D$ such that if $\tau(a,b)>0$ then $d(a,b)=D$ .

(2) For the $D$ in part (1) and any $a\in A$ and $b\in B$ , we have $d(a,b)\geq D$ .

(3) For the $D$ in part (1) and for all $k$ , $W^{k}(\mu_{A},\mu_{B})=D^{k}$ .*

Proof.

(1)This is a restatement of Lemma 3.7.

(2) For a fixed transportation $\tau$ , we define a graph $J_{\tau}^{\prime}$ , with vertex set $S_{\tau}=\{(a,b):\tau(a,b)>0\}$ and edge set $E(J_{\tau}^{\prime})=\{(a_{1},b_{1})(a_{2},b_{2}):\widetilde{m}(a_{1},b_{1})\cap\widetilde{m}(a_{2},b_{2})\neq\emptyset\}$ . Because $\tau$ has no partition, $J_{\tau}^{\prime}$ is a connected graph.

By way of contradiction, assume that $d(x,y)<D$ for some fixed pair $x\in A,y\in B$ . Let $(a_{1},b_{1})$ be such that $\tau(a_{1},b_{1})>0$ and $x=a_{1}$ and let $(a_{*},b_{*})$ be such that $\tau(a_{*},b_{*})>0$ and $y=b_{*}$ . Because $J_{\tau}^{\prime}$ is connected, there exists a path $(a_{1},b_{1}),(a_{2},b_{2}),\ldots,(a_{\ell},b_{\ell})=(a_{*},b_{*})$ . By (1), we have that $d(a_{i},b_{i})=D$ for all $i$ . By the triangle inequality and the assumption $\widetilde{m}(a_{i},b_{i})\cap\widetilde{m}(a_{i+1},b_{i+1})\neq\emptyset$ , we have that $d(b_{i},a_{i+1})\leq D$ . This is a contradiction, as $\tau$ is not cyclically monotone.

(3) By (2), we see that $W^{k}(\mu_{A},\mu_{B})\geq D^{k}$ . By (1), we see that $W^{k}(\mu_{A},\mu_{B})\leq W^{k}(\tau)=D^{k}$ . ∎

Let us close this subsection by remarking that finding an optimal transportation for a given $\mu_{A},\mu_{B}$ with a fixed metric $d$ is a problem in linear programming, where the support of each of $\mu_{A}$ and $\mu_{B}$ appear as a unique constraint (elements in the intersection of the supports of $\mu_{A}$ and $\mu_{B}$ thus show up as two constraints). The solution to the dual problem is $(\phi,\psi)$ , where $\phi$ is a function over the support of $\mu_{A}$ and $\psi$ is a function over the support of $\mu_{B}$ . The dual problem is named as the “dual Kantorovich problem” by Villani in chapter 5 of [37].

Remark 3.12.

A consequence of Lemma 3.11 applied to Remark 5.13 of [37] is that the optimal transportations between $\mu_{A}$ and $\mu_{B}$ will have no partitions if and only if the solutions to the dual problem only have constant valued functions $\phi,\psi$ .

3.4 Displacement convexity and the hypercube

Our first result is that the hypercube does not have positive or nonnegative sort-of-strong displacement convexity.

Example 3.13.

The $d$ -dimensional hypercube for $d\geq 10$ contains vertex sets $A$ and $B$ such that when $\mu_{A}$ and $\mu_{B}$ are uniform probability measures with support on $A$ and $B$ there exists an optimal transport $\tau:A\times B\rightarrow\mathbb{R}$ such that the entropy of the probability space

[TABLE]

is less than the average of the entropies of $\mu_{A}$ and $\mu_{B}$ .

Proof.

We return to the notation of the Boolean lattice introduced in Section 3.1. Let $A=\{\{1\},\{2\},\ldots,\{d^{\prime}\}\}$ and $B=\{\{d^{\prime}+1\},\{d^{\prime}+2\},\ldots,\{2d^{\prime}\}\}$ for $2d^{\prime}\leq d$ . Every vertex in $A$ is distance $2$ from any vertex in $B$ , so any transportation function $\tau$ is optimal. We choose the transportation function such that $\tau(\{i\},\{d^{\prime}+i\})=\mu_{A}(\{i\})=\mu_{B}(\{d^{\prime}+i\})=\frac{1}{d^{\prime}}$ for each $1\leq i\leq d^{\prime}$ . So $\mu_{C}(\{i,d^{\prime}+i\})=\frac{1}{2d^{\prime}}$ , $\mu_{C}(\emptyset)=1/2$ , and $\mu_{C}$ is [math] otherwise. The entropy of $\mu_{A}$ and $\mu_{B}$ are each $\ln(d^{\prime})$ , and the entropy of $\mu_{C}$ is $\ln(d^{\prime})/2+\ln(2)$ . The example thus holds when $d^{\prime}>4$ . ∎

Now we will present a result that is progress towards showing that the hypercube does have positive sort-of-weak displacement convexity. Let us assume for the rest of the section that all transportations do not have a partition as in Definition 3.8. Let $A$ be the support of $\mu_{A}$ and $B$ be the support of $\mu_{B}$ ; the lack of a partition implies that $A\cap B=\emptyset$ .

Recall the map $\phi:(S,T)_{r}\times C^{(r)}_{\lfloor r/2\rfloor,\lceil r/2\rceil}\rightarrow(\widehat{m}(S,T),\widehat{m}(S,T))_{r}$ defined in the proof to Theorem 3.5. We generalize it as follows: let $C_{R}=\widetilde{m}(\{\emptyset\},\{\{1,2,\ldots,R\}\})$ using the notation of the Boolean lattice, and let $\widetilde{\phi}:A\times B\times C_{R}\rightarrow\widetilde{m}(A,B)\times\widehat{m}(A,B)$ . If $R$ is even, then $C_{R}=C^{(R)}_{R/2}$ and $\phi=\widetilde{\phi}$ . If $R$ is odd, then elements of $C_{R}$ are edges $xy$ , where $|x|=(R-1)/2,|y|=(R+1)/2$ . We then define

[TABLE]

We have one advantage on the hypercube when working with an optimal transportation rather than the transportation $\tau(a,b)=\mu_{A}(a)\mu_{B}(b)$ , which is used for Brunn-Minkowski curvature: in the following Lemma we show that $\widetilde{\phi}$ is injective.

Lemma 3.14.

If $(\alpha_{1},\beta_{1}),(\alpha_{2},\beta_{2})\in A\times B$ and $\tau(\alpha_{1},\beta_{1})\tau(\alpha_{2},\beta_{2})>0$ for optimal transport $\tau$ , then $\phi(\alpha_{1},\beta_{1},\pi_{1})\neq\phi(\alpha_{2},\beta_{2},\pi_{2})$ for any $\pi_{1},\pi_{2}\in C_{R}$ .

Proof.

First, let us assume that $R$ is even, and so $C_{R}$ is a collection of vertices. Recall that $d(\pi_{1},\pi_{2})=d(\alpha_{1},\alpha_{2})=d(\beta_{1},\beta_{2})$ . Also, that if $\pi^{\prime}$ is fixed, then $\phi_{\pi^{\prime}}(s,t)=\phi(s,t,\pi^{\prime})$ is injective. Therefore $d(\pi_{1},\pi_{2})>0$ . Notice that $d(\alpha_{1},\beta_{2})=d(\alpha_{2},\beta_{1})=R-d(\pi_{1},\pi_{2})<R$ . This contradicts Lemma 3.11.

Now suppose that $R$ is odd, so that $\pi_{1}=(x_{1},y_{1})$ and $\pi_{2}=(x_{2},y_{2})$ . We have that $d(\alpha_{1},\alpha_{2})=d(\beta_{1},\beta_{2})=d(x_{1},y_{2})=d(y_{1},x_{2})$ , and the same argument follows. ∎

As we saw in Example 3.13, the midpoints will not spread out for every transportation between sets. Each transportation function is a probability measure over the space $V(G)\times V(G)$ , and we will work with the probability measure that maximizes $S(\tau)$ (while still satisfying the conditions of being an optimal transportation).

Lemma 3.15.

Fix probability distributions $\mu_{A},\mu_{B}$ , and let $\tau$ maximize the entropy among optimal transportations from $\mu_{A}$ to $\mu_{B}$ . Assume that $\tau$ has no partition. For a fixed element $X\in V(G)\cup E(G)$ , let $\{(a_{1},b_{1}),\ldots,(a_{k},b_{k})\}$ be the pairs of points such that $\tau(a_{i},b_{i})>0$ and $X\in\widetilde{m}(a_{i},b_{i})$ . Possibly the list $a_{1},\ldots,a_{k}$ has repeats, and so may $b_{1},\ldots,b_{k}$ . Then there exists a $t$ such that $d(a_{i},b_{j})=t$ for all $i,j$ , and $X\in\widetilde{m}(a_{i},b_{j})$ for all $i,j$ . Moreover, there exists a function $\tau_{X}:G\rightarrow\mathbb{R}$ such that $\tau(a_{i},b_{j})=\tau_{X}(a_{i})\tau_{X}(b_{j})$ for all $i,j$ .

Proof.

That there exists a $t$ such that $d(a_{i},b_{i})=t$ for all $i$ is a restatement of Lemma 3.7. By the triangle inequality, we have that $d(a_{i},b_{j})\leq t$ for all $i,j$ . By Lemma 3.11 $d(a_{i},b_{j})\geq t$ . Moreover, the sharpness of the triangle inequality implies that $X\in\widetilde{m}(a_{i},b_{j})$ for all $i,j$ .

Let $\tau_{X}^{\circ}(a_{i})=\sum_{j}\tau(a_{i},b_{j})$ , $\tau_{X}^{\circ}(b_{j})=\sum_{i}\tau(a_{i},b_{j})$ , and $T_{X}=\sum_{i}\tau_{X}^{\circ}(a_{i})=\sum_{j}\tau_{X}^{\circ}(b_{j})$ . Let $\tau^{*}(a_{i},b_{j})=\tau_{X}^{\circ}(a_{i})\tau_{X}^{\circ}(b_{j})/T_{X}$ and $\tau^{*}(u,v)=\tau(u,v)$ otherwise. By construction, $\tau^{*}$ is a transportation from $\mu_{A}$ to $\mu_{B}$ , and $W^{2}(\tau^{*})=W^{2}(\tau)$ , so $\tau^{*}$ is also optimal. We see that $\tau^{*}$ can be thought of as the product of distributions $\tau_{X}^{\circ}(\{a_{1},\ldots,a_{k}\})\times\tau_{X}^{\circ}(\{b_{1},\ldots,a_{k^{\prime}}\})$ , which is known to maximize entropy over product spaces by the independence inequality. Therefore $S(\tau^{*})\geq S(\tau)$ , with equality only when $\tau^{*}=\tau$ . By construction, $\tau^{*}$ satisfies the conclusion of the lemma ( $\tau_{X}=\tau_{X}^{\circ}/\sqrt{T_{X}}$ ), and so the proof concludes by the condition that $\tau$ had maximum entropy among optimal transportations. ∎

Before we prove our theorem about the entropy of midpoints, let us recall a few facts about entropy. Let $\mu$ be a probability measure over the product space $X=X_{1}\times X_{2}\times\cdots X_{k}$ . The entropy of $\mu$ is defined as $S(\mu)=\sum_{y}\mu(y)\ln\left(\frac{1}{\mu(y)}\right)$ . We are interested in how the entropy acts when we restrict some coordinates of $X$ . For $T\subseteq\{1,2,\ldots,k\}$ , let $X_{T}=\prod_{j\in T}X_{j}$ . For $T^{\prime}\subseteq T$ , $z^{\prime}\in X_{T^{\prime}}$ , and $z\in X_{T}$ , we use the notation $z\in z^{\prime}$ to denote the situation where $z^{\prime}_{j}=z_{j}$ for all $j\in T^{\prime}$ . Let $\mu_{T}$ be the projection of $\mu$ into $X_{T}$ , which is equivalently the probability distribution over $X_{T}$ such that $\mu_{T}(z)=\sum_{y\in z}\mu(y)$ (because $\mu=\mu_{\{1,2,\ldots,k\}}$ ). For $T^{\prime}\subseteq T$ , the conditional entropy is

[TABLE]

From the definitions, when $z^{\prime}\in z$ we have $\mu_{T}(z)\leq\mu_{T^{\prime}}(z^{\prime})$ , and so the conditional entropy is always nonnegative. The name comes from the fact that if $z^{\prime}\in X_{T^{\prime}}$ is fixed and $y_{z^{\prime}}\in X_{T}$ is a random variable conditioned on $y_{z^{\prime}}\in z^{\prime}$ , then $S(\mu_{T}|\mu_{T^{\prime}})=\mathbb{E}_{z^{\prime}}S(y_{z^{\prime}})$ . From the formula, we see that if $T^{\prime\prime}\subseteq T^{\prime}\subseteq T$ , then $S(\mu_{T^{\prime}}|\mu_{T^{\prime\prime}})=S(\mu_{T}|\mu_{T^{\prime\prime}})-S(\mu_{T}|\mu_{T^{\prime}})$ .

We define a probability distribution $\zeta$ over $A\times B\times C_{R}\times M\times M$ , where $\zeta(a,b,c,m,m^{\prime})=\tau(a,b)/|C_{R}|$ if $\widetilde{\phi}(a,b,c)=(m,m^{\prime})$ , and $\zeta(a,b,c,m,m^{\prime})=0$ otherwise. We will study how the entropy of $\zeta$ changes as we project onto specific coordinates. This will be clearer if we use the following abuse of notation: let $A$ represent the first coordinate, $B$ represent the second coordinate, $C_{R}$ represent the third coordinate, $M$ represent the fourth coordinate, and $M^{\prime}$ represent the fifth coordinate. So $\zeta_{\{A,C_{R},M\}}$ is $\zeta$ projected onto the first, third, and fourth coordinates. By construction, $\zeta_{\{A\}}=\mu_{A}$ , $\zeta_{\{B\}}=\mu_{B}$ , $\zeta_{\{M\}}=\zeta_{\{M^{\prime}\}}=\mu_{C}$ , $\tau=\zeta_{\{A,B\}}$ , and $\zeta_{\{C_{R}\}}$ is a uniform distribution. By symmetry, for any $S\subseteq\{A,B,C_{R}\}$ we have that $\zeta_{S\cup\{M\}}$ is isomorphic to $\zeta_{S\cup\{M^{\prime}\}}$ . Because $\widetilde{\phi}$ is injective, each of $\zeta_{\{M,M^{\prime}\}},\zeta_{\{A,B,C_{R}\}},\zeta_{\{A,B,M\}}$ is isomorphic to $\zeta$ .

We will use the following technical statement.

Claim 3.16.

Let $\zeta$ be defined as above for optimal transportation $\tau$ with maximum entropy that has no partition. We have

[TABLE]

Proof.

By the injectivity of $\widetilde{\phi}$ , the sets $(m,m^{\prime})$ for a fixed $m$ is in bijection with the sets $(a,b)$ such that $m\in\widetilde{m}(a,b)$ and $\tau(a,b)>0$ . By Lemma 3.15, we have $\zeta_{\{A,M\}}(a,m)=\sum_{i}\frac{\delta_{m}(a)\delta_{m}(b_{i})}{|C_{R}|}$ and

[TABLE]

By the definition of conditional entropy and Lemma 3.15, we have that

[TABLE]

and

[TABLE]

So

[TABLE]

The second equality follows from a symmetric argument. The third equality follows a similar argument, with

[TABLE]

and

[TABLE]

For the final inequality, we first remark that $\mu_{A}(a)\geq\delta_{m^{*}}(a)\sum_{b}\delta_{m^{*}}(b)$ for any fixed $m^{*}$ . So

[TABLE]

The result for $S(\zeta_{\{B,M\}}|\zeta_{\{B\}})$ follows symmetrically. ∎

Now we are prepared to prove the theorem.

Theorem 3.17.

Let $\mu_{A},\mu_{B}$ be probability distributions over the discrete hypercube. Let $\tau$ be an optimal transportation from $\mu_{A}$ to $\mu_{B}$ that maximizes entropy, and let $\mu_{C}$ be the probability distribution over $\widetilde{m}(\mu_{A},\mu_{B})$ as calculated by $\tau$ with distance interpolation. If $d(a,b)=R$ for some constant $R$ whenever $\tau(a,b)>0$ , then

[TABLE]

Proof.

Summing the first two equalities from Claim 3.16 and simplifying by the definition of conditional entropy, we have that

[TABLE]

The last inequality of Claim 3.16 implies that $S(\zeta_{\{A,M\}})\geq S(\mu_{A})+\ln(|C_{R}|)$ , so

[TABLE]

Now $S(\zeta)=S(\zeta_{\{M,M^{\prime}\}})\leq S(\zeta_{\{M\}})+S(\zeta_{\{M^{\prime}\}}),$ so

[TABLE]

When we combine (11) and (12), we see that

[TABLE]

The right hand side is minimized when $S(\zeta)=\frac{2}{3}(S(\mu_{A})+S(\mu_{B})+2\ln(|C_{R}|))$ , and the resulting value is the statement of the theorem. ∎

Theorem 3.18.

Let $\mu_{A},\mu_{B}$ be probability distributions over the $d$ -dimensional discrete hypercube. There exists a transportation $\tau$ that when combined with distance interpolation produces a probability distribution $\mu_{C}$ over the midpoints such that

[TABLE]

Proof.

It is easy to calculate that $|C_{2i}|={2i\choose i}\approx 2^{2i}i^{-0.5}$ and that $|C_{2i+i}|={2i+1\choose i}(i+1)\approx 2^{2i+1}i^{0.5}$ . We used computer software to calculate $\ln(2)\geq 0.69$ and to confirm that $\ln(|C_{R}|)\geq 0.6R-1$ . So Lemma 3.10 applied to Theorem 3.17 provides that

[TABLE]

For any integers $\ell\leq\ell^{\prime}$ , $k\leq k^{\prime}$ , space $\Omega$ and probability measures $\mu_{A},\mu_{B}$ we have that $W^{\ell}(\mu_{A},\mu_{B})\geq W^{\ell^{\prime}}(\mu_{A},\mu_{B})\operatorname{diam}(\Omega)^{\ell-\ell^{\prime}}$ and $(W^{\ell}(\mu_{A},\mu_{B}))^{k}\geq(W^{\ell}(\mu_{A},\mu_{B}))^{k^{\prime}}\operatorname{diam}(\Omega)^{\ell(k-k^{\prime})}$ . ∎

Let us finish this section by returning to the Brunn-Minkowski curvature of the hypercube.

Proposition 3.19.

There exists $\delta>0$ and $D$ such that the $d$ -dimensional hypercube for $d>D$ has Brunn-Minkowski curvature at least $\frac{1}{2d}(1+\delta)$ .

Proof.

First, let us consider the situation where $d(S,T)\geq(1-\epsilon_{1})d$ for some $\epsilon_{1}>0$ . Note that for any $(a,b)\in(S,T)_{r}$ , we have that

[TABLE]

On the other hand, because $1-\epsilon_{1}>1/2$ , we have that $|S|,|T|\leq 2^{d}e^{d(1/2-\epsilon_{1})^{2}/2}$ . So if $\epsilon_{1}=0.01$ , then $r\geq 99d/100$ , and for sufficiently large $d$ and sufficiently small $\delta$ we get that

[TABLE]

So now assume that $d(S,T)<(1-\epsilon_{1})d$ . The version of the claim in the proof to Theorem 3.5 that appears in [32] is that $|(\widehat{m}(S,T),\widehat{m}(S,T))_{r}|>|(S,T)_{r}|e^{d(S,T)^{2}/(8r)}$ . This inequality is weakest when $r$ is largest, so an improvement on this inequality for $r\geq(1-\epsilon_{2})d$ for some fixed $\epsilon_{2}>0$ is sufficient for an improvement on the final result. Fix a $c\in C^{(r)}_{\lfloor r/2\rfloor,\lceil r/2\rceil}$ , and define a function $\psi:A\times B\rightarrow\widehat{m}(S,T)\times C^{(d)}_{r}$ such that the first coordinate of $\psi(a,b)$ is $\phi(a,b,c)$ and the second coordinate is a vector over $\{0,1\}^{d}$ such that the set of coordinates with nonzero entries is the symmetric difference between $a,b$ as elements of the Boolean lattice. By the binary nature of the discrete hypercube, $\psi$ is invertible. But when $r\geq(1-\epsilon_{2})d$ we have that $|C^{(d)}_{r}|\leq 2^{d}e^{-(1/2-\epsilon_{2})^{2}d/2}$ and for any $(a,b)\in(S,T)_{r}$ we have $|\widehat{m}(S,T)|\geq|\widehat{m}(a,b)|\geq{r\choose r/2}\approx 2^{r}/\sqrt{r}$ . So the stronger bound comes from

[TABLE]

By setting $\delta,\epsilon_{2}$ as a function of $\epsilon_{1}$ , we have that $2^{-\epsilon_{2}d}e^{d(1-2\epsilon_{2})^{2}/8}>e^{(1+\delta)d(1-\epsilon_{1})^{2}/(8(1-\epsilon_{2}))}$ . ∎

3.5 Strong displacement convexity

Theorem 3.20.

If $G$ is a graph with nonnegative strong displacement convexity, then $G$ is a path, a cycle, a complete graph, or a complete graph minus an edge, and $G$ has strong displacement convexity [math].

First we will discuss the curvature of paths and cycles, and then we will show that no other graph can have non-negative curvature.

Lemma 3.21.

The path and the cycle have strong displacement convexity [math].

Proof.

We assume that our optimal transportation $\tau$ has no partition, and then use Lemma 3.10 to handle the other case. But Lemma 3.11 can only be satisfied on the path or cycle for point masses for $\mu_{A},\mu_{B}$ . But then $S(\mu_{C})\geq S(\tau)\geq\max\{S(\mu_{A}),S(\mu_{B})\}$ . ∎

Lemma 3.22.

If $G$ is a graph with nonnegative strong displacement convexity, then for any $v\in V(G)$ and $u_{1},u_{2},w_{1},w_{2}\in N(v)$ such that $(u_{1},u_{2})\neq(w_{1},w_{2})$ we have that $u_{1}u_{2}\in E(G)$ or $w_{1}w_{2}\in E(G)$ .

Proof.

By way of contradiction, suppose that $u_{1},u_{2},w_{1},w_{2}\in N(v)$ , $u_{1}\neq w_{1}$ , and $u_{1}u_{2},w_{1}w_{2}\notin E(G)$ . Then let $\mu_{A}$ be the uniform distribution over $\{u_{1},w_{1}\}$ , and let $\mu_{B}$ be the uniform distribution over $\{u_{2},w_{2}\}$ (which may be one or two points). There exists an optimal transportation from $\mu_{A}$ to $\mu_{B}$ such that the only midpoint is $v$ . ∎

Proof of Theorem 3.20. If $G$ is connected and every vertex has degree $1$ or $2$ , then $G$ is a path or a cycle. So suppose $v$ is incident with at least $3$ edges. If $V(G)\subseteq N(v)\cup\{v\}$ , then by Lemma 3.22, $G$ is either the complete graph or the complete graph minus an edge. So assume there exists a $w$ such that $d(v,w)=2$ ; equivalently $vw\notin E(G)$ and $N(v)\cap N(w)\neq\emptyset$ .

If $N(v)\setminus N(w)\neq\emptyset$ , then because $d(v)\geq 3$ and by Lemma 3.22 there exists an $xy\in E(G)$ such that $x\in N(v)\setminus N(w)$ and $y\in N(v)\cap N(w)$ . But this contradicts Lemma 3.22, because $N(y)$ is missing edges $vw$ and $xw$ . So assume that $N(v)\subseteq N(w)$ , and by symmetry this implies $N(v)=N(w)$ .

Now we claim that $\{v,w\}\cup N(v)=V(G)$ . If this is not true, then there exists a $x\in N(v)$ and a $y\notin\{v,w\}\cup N(v)$ such that $xy\in E(G)$ . But this contradicts Lemma 3.22, because $N(x)$ is missing edges $yv$ and $yw$ .

So $N(v)=N(w)=V(G)-\{v,w\}$ and $vw\notin E(G)$ . If $G$ is not the complete graph minus an edge, then there exists a $xy\notin E(G)$ . By Lemma 3.22, $E(G)$ is all pairs of points except $vw$ and $xy$ . Because $d(v)\geq 3$ , there exists a $z\in N(v)\setminus\{x,y\}$ . But this contradicts Lemma 3.22, because $N(z)$ is missing edges $xy$ and $vw$ . $\square$

3.6 Other graphs

A typical graph will not have curvature for the same reasons that the hypercube does not: by examining the neighborhood of a single vertex. The $-2/3$ in Theorem 3.18 is necessary for transportations $\tau$ with $W^{2}(\tau)=1$ due to the fact that $\ln(|C_{1}|)=0$ . However, we may be able to prove a “rough geometry” version of curvature. That is, the curvature equation may exist with a small error term that accounts for transportations with small Wasserstein distances. We establish a first result towards this goal by showing that expander graphs have some flavor of positive curvature.

Theorem 3.23.

Let $G$ be a graph such that every vertex has degree $d$ and $\lambda\ll 1$ is the second largest eigenvalue of the normalized adjacency matrix. Let $S,T$ be vertex sets such that $|S|=|T|$ . If $|S||T|e^{d(S,T)\ln(1+\lambda)}>\lambda n^{2}/4(1+O(1))$ , then $|m(S,T)|\geq\theta(\lambda n)$ .

Proof.

We define $S^{i}=\{v:d(v,S)\leq i\}$ . For disjoint vertex sets $X,Y$ , $e(X,Y)$ denote the number of edges with an endpoint in each of $X$ and $Y$ . We will make use of two theorems from spectral graph theory. The first is Corollary 5.5 of [16], which states that $e(S^{i},S^{i+1}-S^{i})\geq d|S^{i}|(1-\lambda)\frac{n-|S^{i}|}{n}$ . The second is the Expander Mixing Lemma (see Theorem 5.1 of [16]), which states that

[TABLE]

Each vertex is adjacent to $d$ edges, and therefore $d|S^{i+1}-S^{i}|\geq e(S^{i},S^{i+1}-S^{i})$ . So if $2|S^{i}|\leq n$ , then $|S^{i+1}|\geq|S^{i}|(1+(1-\lambda)/2)$ (note that $\lambda<1$ ). Let $S^{\prime}=S^{d(S,T)/2}$ and $T^{\prime}=T^{d(S,T)/2}$ , and thus

[TABLE]

Next, each vertex in $S^{\prime}$ with a neighbor in $T^{\prime}$ is a midpoint in $m(S,T)$ . Each vertex is adjacent to $d$ edges, and so $|m(S,T)|\geq|E(S^{\prime},T^{\prime})|/d$ . Therefore $|m(S,T)|\geq|S^{\prime}||T^{\prime}|/n-\lambda\sqrt{|S^{\prime}||T^{\prime}|}=\frac{1}{n}\left(\sqrt{|S^{\prime}||T^{\prime}|}-\sqrt{\lambda}n/2\right)^{2}-\lambda n/4$ . The theorem then follows from the bound on $|S^{\prime}||T^{\prime}|$ . ∎

We are interested in determining whether the assumption on a fixed degree is necessary. Using the normalized Laplacian, there exist generalizations of expander graphs for arbitrary degree distributions [17]. However, it may be that even rough positive curvature will only exist when the degree distribution falls into a tight range. A power-law graph is a graph whose degree distribution approximately follows the distribution of the inverse of a polynomial (and hence are widely skewed). Such graphs have become popular recently for their ability to model social, technical, and biological networks. Models for such graphs include Kronecker graphs (used for Graph500).

Conjecture 3.24.

It is impossible for a power-law graph to have “big picture” curvature.

The conjecture would imply that positively curved networks will not be useful for social networks, as negative curvature has been. On the other hand, it may still be useful to study engineered networks (for example, super computing clusters frequently use product topologies, such as the discrete torus). Chung, Lu, and Vu [18] studied the spectral properties of a random power-law graph. We provide some evidence for our conjecture by studying random walks on arbitrary power-law graphs.

Theorem 3.25.

Suppose we weight a path $P_{xy}=x,u_{1},u_{2},\ldots,u_{k},y$ as $w(P_{xy})=\prod_{i=1}^{k}\frac{1}{d(u_{i})}$ . If we pick a random shortest path $P_{xy}$ , where $P_{xy}$ is picked proportional to $w(P_{xy})$ across all pairs $x,y$ , then the probability that $z$ is the midpoint of $P_{xy}$ is proportional to $d(z)$ .

Proof.

We consider a random process that is a random walk with edge teleportation. Fix some arbitrary $0<c<1$ . We place some token on a vertex of the graph and will move that token according to a random walk with probability $c$ and according to edge teleportation with probability $1-c$ .

We start the process with a stationary distribution on the vertices for a random walk; let the probability that our token is on $v$ be proportional to $d(v)$ . Suppose the token is on vertex $z$ , and let us discuss how the token will move at the next step in the process. With probability $c$ we move the token to a random neighbor of $z$ . Note that this step preserves the probability distribution; after a random walk step the probability that the token is on $z$ is still proportional to $d(z)$ . With probability $1-c$ we choose an edge uniformly at random and teleport the token to one of its endpoints chosen uniformly at random. This too preserves the probability distribution; and so the probability that the token is on $z$ will be proportional to $d(z)$ as we iterate this process.

Let $z_{0},z_{1},\ldots$ be an infinite sequence of vertices such that the token is on vertex $z_{i}$ after $i$ iterations of our process. Let $i_{1},i_{2},\ldots$ be the iterations such that the transition $z_{i_{j}-1}\rightarrow z_{i_{j}}$ involves edge teleportation for each $j$ and $z_{k-1}\rightarrow z_{k}$ involves a step in a random walk when $k\notin\{i_{1},i_{2},\ldots\}$ . We choose a random geodesic as follows: repeatedly pick some $j\in\mathbb{Z}$ until the path $P_{j}=z_{i_{j}},z_{i_{j}+1},\ldots,z_{i_{j+1}-1}$ is a geodesic.

The theorem will follow when we establish two facts: (1) the probability that the midpoint of $P_{j}$ is $z$ is proportional to $d(z)$ and (2) the probability that a fixed geodesic $P^{\prime}$ is chosen is proportional to $w(P^{\prime})$ . Part (1) is easy, as the probability of any vertex $z_{i}$ being $z$ is proportional to $d(z)$ . All that remains is (2).

Let $P_{xy}=x,u_{1},u_{2},\ldots,u_{k},y$ be some fixed geodesic, and let us calculate the probability $q(P_{xy})$ that $P_{xy}=P_{j}$ for the first value of $j$ (in other words, without accounting for the fact that we throw out non-geodesic paths). The probability that $z_{i_{j}}=x$ is $\frac{d(x)}{2|E(G)|}$ . The probability that $i_{j+1}=i_{j}+k+2$ is $c^{k+1}(1-c)$ . For shorthand, let us denote $u_{0}=x$ and $u_{k+1}=y$ . Given that $z_{i_{j}+\ell}=u_{\ell}$ and $i_{j+1}=i_{j}+k+2$ , the probability that $z_{i_{j}+\ell+1}=u_{\ell+1}$ is $\frac{1}{d(u_{\ell})}$ . Thus

[TABLE]

Now let us adjust the calculation of the probability of $P_{xy}=P_{j}$ to account for the fact that we will throw out non-geodesic paths. The probability is clearly proportional to $q(P_{xy})$ , when taken in comparison to all other paths. When we consider the proportional values, the term $\frac{1-c}{2|E(G)|}$ cancels as it is a uniform constant. Therefore the probability that $P_{xy}=P_{j}$ for the final value of $j$ is proportional to $c^{k+1}\prod_{i=1}^{k}\frac{1}{d(u_{i})}=c^{k+1}w(P_{xy})$ . The theorem follows by considering the limit $c\rightarrow 1$ . ∎

Now suppose we perform the same proof, but this time the probability of an edge teleportation from $z_{i}$ to $z_{i+1}$ is $\frac{1}{d(z_{i})+1}$ rather than $c$ . The following theorem is the result of this modification, plus averaging the probability of path $P_{xy}$ with its reverse.

Theorem 3.26.

Suppose we weight a path $P_{xy}=x=u_{1},u_{2},\ldots,u_{k}=y$ as $w(P_{xy})=\left(d(x)+d(y)\right)\prod_{i=1}^{k}\frac{1}{d(u_{i})+1}$ . If we pick a random shortest path $P_{xy}$ , where $P_{xy}$ is picked proportional to $w(P_{xy})$ across all pairs $x,y$ , then the probability that $z$ is the midpoint of $P_{xy}$ is proportional to $d(z)$ .

Appendix A Statements that were not used

There does not seem to be a significant difference between weak displacement convexity and Brunn-Minkowski curvature. We make this statement based on the fact that Lemmas 3.7 and 3.15 hold for any graph. In our work towards Theorem 3.17, we proved several lemmas that were not part of the final version of the proof. However, the statements are interesting in their own right, and may be useful towards establishing a bound on curvature for other spaces.

The following claim is from when we were still working with $\widehat{m}$ instead of $\widetilde{m}$ .

Claim A.1.

For probability distributions $\mu_{A},\mu_{B}$ , let $\tau$ be an optimal transportation. Suppose $\tau(\alpha_{1},\beta_{1})\tau(\alpha_{2},\beta_{2})>0$ and $\widehat{m}(\alpha_{1},\beta_{1})\cap\widehat{m}(\alpha_{2},\beta_{2})\neq\emptyset$ . Under these circumstances, $|d(\alpha_{1},\beta_{1})-d(\alpha_{2},\beta_{2})|\leq 2$ .

Proof.

The proof by contradiction of Lemma 3.7 will hold when $\lceil d(\alpha_{1},\beta_{1})/2\rceil<\lfloor d(\alpha_{2},\beta_{2})/2\rfloor$ . ∎

The goal of the following lemma was to prove an analogue of Hall’s Marriage theorem from the set of $(a,b)$ such that $\tau(a,b)>0$ to the set of midpoints. The outline was to prove that any collection $\{(a_{1},b_{1}),\ldots,(a_{k},b_{k})\}$ had large sets $\{a_{1},\ldots,a_{k}\}$ and $\{b_{1},\ldots,b_{k}\}$ , and therefore many midpoints by the Brunn-Minkowski inequality.

Claim A.2.

For any graph $G$ and probability measures $\mu_{A},\mu_{B}$ , there exists an optimal transportation $\tau$ from $\mu_{A}$ to $\mu_{B}$ such that for all $A^{\prime},B^{\prime}$ , we have that

[TABLE]

Proof.

For a fixed transportation $\rho$ , we define a bipartite graph $J_{\rho}$ with disjoint vertex sets $S_{A}\cup S_{B}$ , where $S_{A}=\{v_{a}:\mu_{A}(a)>0\}$ and $S_{B}=\{u_{b}:\mu_{B}(b)>0\}$ . If $\mu_{A}(w)>0$ and $\mu_{B}(w)>0$ for some vertex $w$ , then $v_{w}$ and $u_{w}$ both exist and are different. We define the edges to be $E(J_{\rho})=\{v_{a}u_{b}:\rho(a,b)>0\}$ . Among all optimal transportations, let $\tau$ be the one that minimizes $|E(J_{\tau})|$ .

We claim that $J_{\tau}$ is a forest. By way of contradiction, suppose that a cycle $C=u_{1},v_{1},u_{2},v_{2},\ldots,u_{k},v_{k}$ is in $J_{\tau}$ . Let $\epsilon_{1}=\min_{i}\rho(u_{i},v_{i})$ and let $\epsilon_{2}=\min_{i}\rho(u_{i+1},v_{i})$ , where the indices are taken modulo $k$ . We consider two transportations $\tau^{\prime},\tau^{\prime\prime}$ , that equal $\tau$ on all pairs over vertices except

•

$\tau^{\prime}(u_{i},v_{i})=\tau^{\prime}(u_{i},v_{i})+\epsilon_{2}$ ,

•

$\tau^{\prime}(u_{i+1},v_{i})=\tau^{\prime}(u_{i+1},v_{i})-\epsilon_{2}$ ,

•

$\tau^{\prime\prime}(u_{i},v_{i})=\tau^{\prime}(u_{i},v_{i})-\epsilon_{1}$ , and

•

$\tau^{\prime\prime}(u_{i+1},v_{i})=\tau^{\prime}(u_{i+1},v_{i})+\epsilon_{1}$ .

By cyclic monotonicity, we have that $\tau^{\prime},\tau^{\prime\prime}$ are each optimal. This is a contradiction, because by construction, $|E(J_{\tau^{\prime}})|,|E(J_{\tau^{\prime\prime}})|<|E(J_{\tau})|$ .

Now consider vertex sets $A^{\prime},B^{\prime}$ and edge set $E^{\prime}=\{(a,b):a\in A^{\prime},b\in B^{\prime},\tau(a,b)>0\}$ . Each vertex in $A^{\prime}\cup B^{\prime}$ induces at most two vertices in $J_{\tau}$ ; let us call this induced subgraph $J[A^{\prime}\cup B^{\prime}]$ . Each edge in $E^{\prime}$ is in $J[A^{\prime}\cup B^{\prime}]$ . Because $J_{\tau}$ is a forest, there are at most $|V(J[A^{\prime}\cup B^{\prime}])|-1$ such edges. ∎

The problem with Claim A.2 is that the Brunn-Minkowski inequality would collect many midpoints between $(a_{i},b_{j})$ where $\tau(a_{i},b_{j})=0$ . Moreover, this seems like a fundamental flaw, given Example 3.13. If anything, this is the complete opposite of the transportation that we did use. By a proof similar to Theorem 3.11(2), it can be shown that if $\tau$ is optimal, maximizes entropy, and has no partition, then $\tau(a,b)>0$ for all pairs with $d(a,b)=D$ , $\mu_{A}(a)>0$ , $\mu_{B}(b)>0$ . Regardless, this claim is an interesting statement in its own right.

The following statement turned out to be unnecessary for proving Theorem 3.20. But it is true for general finite discrete spaces, and so it may be applicable towards a more general statement.

Claim A.3.

If $G$ has strong displacement convexity for probability measures $\mu_{A},\mu_{B}$ that are uniform over vertex sets $A,B$ , respectively, then $G$ has strong displacement convexity.

Proof.

Let $\mu_{A},\mu_{B}$ be arbitrary probability distributions with optimal transport $\tau$ with geodesic choices $Q_{\alpha,\beta}$ which produces midpoint distribution $\mu_{C}$ . Suppose there exists some vertex $m$ such that $m=P_{1}(1/2)=P_{2}(1/2)$ for $Q_{\alpha_{1},\beta_{1}}(P_{1})>0$ , $Q_{\alpha_{2},\beta_{2}}(P_{2})>0$ , and $\mu_{A}(\alpha_{1})>\mu_{A}(\alpha_{2})$ (we allow the possibility of $\beta_{1}=\beta_{2}$ ). Let $\epsilon=\min\{\mu_{A}(\alpha_{2}),(\mu_{A}(\alpha_{1})-\mu_{A}(\alpha_{2}))/2\}>0$ , and define $\mu_{A}^{\prime}(\alpha_{1})=\mu_{A}(\alpha+1)-\epsilon$ , $\mu_{A}^{\prime}(\alpha_{2})=\mu_{A}(\alpha_{2})+\epsilon$ , and $\mu_{A}^{\prime}=\mu_{A}$ otherwise. By construction, there exists a transportation $\tau^{\prime}$ from $\mu_{A}^{\prime}$ to $\mu_{B}$ and geodesic choices $Q^{\prime}_{\alpha,\beta}$ such that the midpoints have distribution $\mu_{C}$ . We propose that this is an optimal transposition from $\mu_{A}^{\prime}$ to $\mu_{B}$ .

Before proving the proposition, let us show how the proposition implies the claim. By convexity we know that $S(\mu_{A}^{\prime})>S(\mu_{A})$ , so if $\mu_{A}^{\prime}$ to $\mu_{B}$ has curvature, then so does $\mu_{A}$ to $\mu_{B}$ . Repeat this process until $\mu_{A}(\alpha_{1})=\mu_{A}(\alpha_{2})$ whenever the geodesics of $\tau$ and $Q_{\alpha,\beta}$ involving $\alpha_{1}$ and $\alpha_{2}$ share a midpoint. We define equivalency class $A_{r}=\{\alpha:\mu_{A}(\alpha)=r\}$ for $r>0$ , and let $M_{A,r}=\sum_{\alpha\in A_{r}}\mu_{A}(\alpha)$ and $\mu_{A,r}$ be the probability distribution $\mu_{A}$ projected onto the set $A^{r}$ and scaled by $M_{A,r}^{-1}$ . If each transportation $\mu_{A,r}\otimes\mu_{B}$ using $\tau_{A,r}$ and $Q_{\alpha,\beta}$ has curvature, then the whole $\mu_{A}\otimes\mu_{B}$ will have curvature as the midpoints do not overlap. But $\mu_{A,r}$ is a uniform measure over $A_{r}$ , which is the conclusion of the claim. Symmetrically we do this to $\mu_{B}$ as well, and so the claim holds.

So now we prove the proposition. By Lemma 3.7 $d(\alpha_{1},\beta_{1})=d(\alpha_{2},\beta_{2})$ , and so $W^{2}(\tau^{\prime})=W^{2}(\tau)=W^{2}(\mu_{A},\mu_{B})$ . Suppose there exists a transportation $\tau^{\prime\prime}$ from $\mu_{A}^{\prime}$ to $\mu_{B}$ such that $W^{2}(\tau^{\prime\prime})<W^{2}(\tau)$ . Among all such transportations, let $\tau^{\prime\prime}$ minimize $|\{(a,b):\tau^{\prime\prime}(a,b)\neq\tau^{\prime}(a,b)\}|$ . If this number is [math], then $W^{2}(\tau^{\prime\prime})=W^{2}(\tau^{\prime})$ , which is a contradiction.

We consider the bipartite graph $J_{\tau^{\prime}}$ and $J_{\tau^{\prime\prime}}$ as constructed in Claim A.2. There must exist some $u_{1},v_{1}$ where $\tau^{\prime}(u_{1},v_{1})<\tau^{\prime\prime}(u_{1},v_{1})$ . But both $\tau^{\prime}$ and $\tau^{\prime\prime}$ move mass into the same distribution $\mu_{B}$ , so there must exist a $u_{2}$ such that $\tau^{\prime}(u_{2},v_{1})>\tau^{\prime\prime}(u_{2},v_{1})$ . We continue this process until we have found a cycle $u_{1},v_{2},\ldots,u_{k},v_{k}$ such that for each $i$ , $\tau^{\prime}(u_{i},v_{i})<\tau^{\prime\prime}(u_{i},v_{i})$ and $\tau^{\prime}(u_{i+1},v_{i})>\tau^{\prime\prime}(u_{i+1},v_{i})$ . We use this cycle to update $\tau^{\prime}$ to be $\tau^{\prime}_{*}$ or $\tau^{\prime\prime}$ to be $\tau^{\prime\prime}_{*}$ as in Claim A.2, and one of two things happen: (1) $|\{(a,b):\tau^{\prime\prime}_{*}(a,b)\neq\tau^{\prime}(a,b)\}|$ decreases and $W^{2}(\tau^{\prime\prime})=W^{2}(\tau^{\prime\prime}_{*})$ , which is a contradiction, or (2) $W^{2}(\tau^{\prime}_{*})<W^{2}(\tau^{\prime})$ . But notice that this update can also be done to $\tau$ into transportation $\tau_{*}$ with $W^{2}(\tau_{*})<W^{2}(\tau)$ , which contradicts the minimality of $\tau$ . ∎

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Albert, B. Das Gupta, and N. Mobasheri, “Topological Implications of Negative Curvature for Biological and Social Networks.” Physical Review E 89 (2014) 032811.
2[2] N. Alon, B. Boppana, and J. Spencer, “An Asymptotic Isoperimetric Inequality.” Geometric and Functional Analysis 8 (1996) 411–436.
3[3] N. Alon and V. Milman, “ λ 1 subscript 𝜆 1 \lambda_{1} , Isoperimetric Inequalities for Graphs, and Supercontractors.” JCTB 38 (1985) 73–88.
4[4] J. Alonso, T. Brady, D. Cooper, V. Ferlini, .M. Lustig, M. Mihalik, M. Shapiro, and H. Short, “Notes on Word Hyperbolic Groups.” Group theory from a geometrical viewpoint, Proceedings of the ICTP Trieste 1990 , World Scientific (1991) 543 – 617.
5[5] F. Bauer, F. Chung, Y. Lin, and Y. Liu, “Curvature aspects of graphs.” Proc. of the AMS 145 (5) (2017) 2033-2042.
6[6] S. Bobkov, C. Houdré, and P. Tetali, “The subgaussian constant and concentration inequalities.” Israel Journal of Mathematics 156 (2006) 255–283.
7[7] B. Bollobás and I. Leader, “Compressions and isoperimetric inequalities.” JCTA 56 (1991) 47–62.
8[8] B. Bollobás and I. Leader, “An isoperimetric inequality on the discrete torus,” SIAM J. Discrete Math. 3 (1990) 32–37.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Probabilistic and Geometrical Applications to Graph Theory

Abstract

1 Motivation

1.1 Background: concentration of measure

Theorem 1.1** ([2]).**

Conjecture 1.2** ([2], page 416).**

Theorem 1.3** ([2]).**

Conjecture 1.4**.**

Conjecture 1.5** ([6]).**

Theorem 1.6** ([6]).**

1.2 Background: graph curvature

Theorem 1.7**.**

1.3 Common theme

2 Concentration Of Lipschitz Functions

2.1 Background

Claim 2.1**.**

Proof.

Theorem 2.2** ([2]).**

Proof.

2.2 Concentration of permutations and levels of the Boolean lattice

Proof.

Remark 2.3**.**

Theorem 2.4**.**

Proof.

Theorem 2.5**.**

Proof.

Theorem 2.6**.**

Proof.

Theorem 2.7**.**

Proof.

2.3 Concentration of cycles

Lemma 2.8**.**

Proof.

Theorem 2.9**.**

Proof.

2.4 The structure of the subgaussian constant and spread

Lemma 2.10**.**

Proof.

Remark 2.11**.**

Remark 2.12**.**

Theorem 2.13**.**

Proof.

Theorem 2.14**.**

Proof.

Corollary 2.15**.**

Claim 2.16**.**

2.5 Tightness for isoperimetric inequalities

Example 2.17**.**

Proof.

Example 2.18**.**

Proof.

Theorem 2.19**.**

Proof.

3 Discrete Positive Curvature

3.1 Convex sets and iterated midpoints

Example 3.1**.**

Theorem 3.2**.**

Proof.

Corollary 3.3**.**

Proof.

Corollary 3.4**.**

Proof.

3.2 The ℓ0\ell_{0}ℓ0​, ℓ1\ell_{1}ℓ1​, and ℓ∞\ell_{\infty}ℓ∞​ metric

Theorem 3.5**.**

Proof.

Open Question 3.6**.**

3.3 Catalog of displacement convexity definitions

Lemma 3.7**.**

Proof.

Definition 3.8**.**

Claim 3.9**.**

Lemma 3.10**.**

Proof.

Theorem 1.1 ([2]).

Conjecture 1.2 ([2], page 416).

Theorem 1.3 ([2]).

Conjecture 1.4.

Conjecture 1.5 ([6]).

Theorem 1.6 ([6]).

Theorem 1.7.

Claim 2.1.

Theorem 2.2 ([2]).

Remark 2.3.

Theorem 2.4.

Theorem 2.5.

Theorem 2.6.

Theorem 2.7.

Lemma 2.8.

Theorem 2.9.

Lemma 2.10.

Remark 2.11.

Remark 2.12.

Theorem 2.13.

Theorem 2.14.

Corollary 2.15.

Claim 2.16.

Example 2.17.

Example 2.18.

Theorem 2.19.

Example 3.1.

Theorem 3.2.

Corollary 3.3.

Corollary 3.4.

3.2 The $\ell_{0}$ , $\ell_{1}$ , and $\ell_{\infty}$ metric

Theorem 3.5.

Open Question 3.6.

Lemma 3.7.

Definition 3.8.

Claim 3.9.

Lemma 3.10.

Lemma 3.11.

Remark 3.12.

Example 3.13.

Lemma 3.14.

Lemma 3.15.

Claim 3.16.

Theorem 3.17.

Theorem 3.18.

Proposition 3.19.

Theorem 3.20.

Lemma 3.21.

Lemma 3.22.

Theorem 3.23.

Conjecture 3.24.

Theorem 3.25.

Theorem 3.26.

Claim A.1.

Claim A.2.

Claim A.3.