On the optimal map in the 2-dimensional random matching problem

Luigi Ambrosio; Federico Glaudo; Dario Trevisan

arXiv:1903.12153·math.PR·March 29, 2019

On the optimal map in the 2-dimensional random matching problem

Luigi Ambrosio, Federico Glaudo, Dario Trevisan

PDF

TL;DR

This paper demonstrates that on a 2D compact manifold, the optimal transport map in a semi-discrete random matching problem can be effectively approximated by a simple analytical form involving the solution to a Poisson equation, confirming a scaling hypothesis.

Contribution

It establishes a rigorous approximation of the optimal transport map using the Poisson problem, extending previous hypotheses to include the map's behavior on manifolds.

Findings

01

Optimal map approximated by identity plus gradient of Poisson solution

02

Validation of Caracciolo et al.'s scaling hypothesis for the map

03

New stability result for optimal transport maps on manifolds

Abstract

We show that, on a $2$ -dimensional compact manifold, the optimal transport map in the semi-discrete random matching problem is well-approximated in the $L^{2}$ -norm by identity plus the gradient of the solution to the Poisson problem $- Δ f^{n, t} = μ^{n, t} - 1$ , where $μ^{n, t}$ is an appropriate regularization of the empirical measure associated to the random points. This shows that the ansatz of Caracciolo et al. (Scaling hypothesis for the Euclidean bipartite matching problem) is strong enough to capture the behavior of the optimal map in addition to the value of the optimal matching cost. As part of our strategy, we prove a new stability result for the optimal transport map on a compact manifold.

Equations141

W_{p}^{p} (μ, ν) : = γ \in Γ (μ, ν) in f \int_{X \times X} d^{p} (x, y) d γ (x, y),

W_{p}^{p} (μ, ν) : = γ \in Γ (μ, ν) in f \int_{X \times X} d^{p} (x, y) d γ (x, y),

W_{p}^{p} (μ, ν) = \int_{M} d^{p} (x, T (x)) d μ (x) .

W_{p}^{p} (μ, ν) = \int_{M} d^{p} (x, T (x)) d μ (x) .

E [W_{p}^{p} (m, μ^{n})] \approx ⎩ ⎨ ⎧ n^{- \frac{p}{2}} (\frac{l o g ( n )}{n})^{\frac{p}{2}} n^{- \frac{p}{d}} if d = 1, if d = 2, if d \geq 3.

E [W_{p}^{p} (m, μ^{n})] \approx ⎩ ⎨ ⎧ n^{- \frac{p}{2}} (\frac{l o g ( n )}{n})^{\frac{p}{2}} n^{- \frac{p}{d}} if d = 1, if d = 2, if d \geq 3.

E [W_{2}^{2} (m, μ^{n})] \sim \frac{lo g ( n )}{4 π n} .

E [W_{2}^{2} (m, μ^{n})] \sim \frac{lo g ( n )}{4 π n} .

- Δ \tilde{f}^{n} \approx μ^{n} - 1 .

- Δ \tilde{f}^{n} \approx μ^{n} - 1 .

- Δ f^{n, t} = μ^{n, t} - 1

- Δ f^{n, t} = μ^{n, t} - 1

\frac{E [ \int _{M} d ^{2} ( T ^{n} , exp ( \nabla f ^{n, t} )) dm ]}{\frac{l o g ( n )}{n}} ≪ \frac{lo g ( lo g ( n ) )}{lo g ( n )} .

\frac{E [ \int _{M} d ^{2} ( T ^{n} , exp ( \nabla f ^{n, t} )) dm ]}{\frac{l o g ( n )}{n}} ≪ \frac{lo g ( lo g ( n ) )}{lo g ( n )} .

n \to \infty lim \frac{E [ \int _{M} d ^{2} ( T ^{n} , exp ( \nabla f ^{n, t} )) dm ]}{E [ \int _{M} d ^{2} ( T ^{n} , \mathds 1 ) dm ]} = 0 .

n \to \infty lim \frac{E [ \int _{M} d ^{2} ( T ^{n} , exp ( \nabla f ^{n, t} )) dm ]}{E [ \int _{M} d ^{2} ( T ^{n} , \mathds 1 ) dm ]} = 0 .

n \to \infty lim \frac{E [ ∥ \nabla f ^{n} - \nabla f ^{n, t} ∥ _{L^{2} (M)}^{2} ]}{E [ ∥ \nabla f ^{n} ∥ _{L^{2} (M)}^{2} ]} = 0 .

n \to \infty lim \frac{E [ ∥ \nabla f ^{n} - \nabla f ^{n, t} ∥ _{L^{2} (M)}^{2} ]}{E [ ∥ \nabla f ^{n} ∥ _{L^{2} (M)}^{2} ]} = 0 .

Q_{t} f (y) = x \in X min \frac{1}{2 t} d^{2} (x, y) + f (x) (t > 0), Q_{0} f = f .

Q_{t} f (y) = x \in X min \frac{1}{2 t} d^{2} (x, y) + f (x) (t > 0), Q_{0} f = f .

\frac{d}{d t} Q_{t} f + \frac{1}{2} ∣ \nabla Q_{t} f ∣^{2} = 0

\frac{d}{d t} Q_{t} f + \frac{1}{2} ∣ \nabla Q_{t} f ∣^{2} = 0

X_{t}\coloneqq\frac{\partial\varphi_{s}}{\partial s}\Big{|}_{s=t}

X_{t}\coloneqq\frac{\partial\varphi_{s}}{\partial s}\Big{|}_{s=t}

∣ X_{t} (x) - X_{t} (y)∣ ≲ ∣ x - y ∣,

∣ X_{t} (x) - X_{t} (y)∣ ≲ ∣ x - y ∣,

∣ X_{t} (φ_{t} (x)) - X_{t} (φ_{t} (y))∣ ≲ ∣ φ_{t} (x) - φ_{t} (y)∣ .

∣ X_{t} (φ_{t} (x)) - X_{t} (φ_{t} (y))∣ ≲ ∣ φ_{t} (x) - φ_{t} (y)∣ .

∣ x - y ∣ ≲ ∣ φ_{t} (x) - φ_{t} (y)∣ and ∣ X (x) - X (y)∣ ≲ ∣ φ_{t} (x) - φ_{t} (y)∣ .

∣ x - y ∣ ≲ ∣ φ_{t} (x) - φ_{t} (y)∣ and ∣ X (x) - X (y)∣ ≲ ∣ φ_{t} (x) - φ_{t} (y)∣ .

X_{t} (φ_{t} (x)) = γ_{x}^{'} (t) and X_{t} (φ_{t} (y)) = γ_{y}^{'} (t) .

X_{t} (φ_{t} (x)) = γ_{x}^{'} (t) and X_{t} (φ_{t} (y)) = γ_{y}^{'} (t) .

∣ γ_{x}^{'} (t) - γ_{y}^{'} (t)∣ ≲ ∣ γ_{x} (0) - γ_{y} (0)∣ + ∣ γ_{x}^{'} (0) - γ_{y}^{'} (0)∣ .

∣ γ_{x}^{'} (t) - γ_{y}^{'} (t)∣ ≲ ∣ γ_{x} (0) - γ_{y} (0)∣ + ∣ γ_{x}^{'} (0) - γ_{y}^{'} (0)∣ .

Q_{t} f (y) = \frac{1}{2 t} d^{2} (φ_{t}^{- 1} (y), y) + f (φ_{t}^{- 1} (y)) .

Q_{t} f (y) = \frac{1}{2 t} d^{2} (φ_{t}^{- 1} (y), y) + f (φ_{t}^{- 1} (y)) .

\frac{d ^{2} ( y , y ^{'} )}{t} ≲ Q_{t} f (y) - Q_{t} f (y^{'}) + \frac{1}{2 t} [d^{2} (φ_{t}^{- 1} (y), y^{'}) - d^{2} (φ_{t}^{- 1} (y), y)] .

\frac{d ^{2} ( y , y ^{'} )}{t} ≲ Q_{t} f (y) - Q_{t} f (y^{'}) + \frac{1}{2 t} [d^{2} (φ_{t}^{- 1} (y), y^{'}) - d^{2} (φ_{t}^{- 1} (y), y)] .

\frac{d}{d t} Q_{t} f + \frac{1}{2} ∣ \nabla Q_{t} f ∣^{2} = 0 .

\frac{d}{d t} Q_{t} f + \frac{1}{2} ∣ \nabla Q_{t} f ∣^{2} = 0 .

Q_{t} f (γ (t)) = f (x) + \frac{t}{2} ∣ \nabla f ∣^{2} (x) and \nabla Q_{t} f (γ (t)) = γ^{'} (t) .

Q_{t} f (γ (t)) = f (x) + \frac{t}{2} ∣ \nabla f ∣^{2} (x) and \nabla Q_{t} f (γ (t)) = γ^{'} (t) .

Lip (Q_{t} f - f) \leq t ∥ \nabla f ∥_{\infty} \cdot ∥ \nabla^{2} f ∥_{\infty} .

Lip (Q_{t} f - f) \leq t ∥ \nabla f ∥_{\infty} \cdot ∥ \nabla^{2} f ∥_{\infty} .

Q_{t} (λ f) (y) = λ Q_{λ t} f (y),

Q_{t} (λ f) (y) = λ Q_{λ t} f (y),

\nabla^{2} d^{2} (\cdot, p) (q) \geq \frac{1}{2} g .

\nabla^{2} d^{2} (\cdot, p) (q) \geq \frac{1}{2} g .

Q_{t} f (y) = x \in B (y, r) in f \frac{1}{2 t} d^{2} (x, y) + f (x) .

Q_{t} f (y) = x \in B (y, r) in f \frac{1}{2 t} d^{2} (x, y) + f (x) .

\nabla (\frac{1}{2 t} d^{2} (\cdot, y) + f (\cdot)) (φ_{t}^{- 1} (y)) = 0 .

\nabla (\frac{1}{2 t} d^{2} (\cdot, y) + f (\cdot)) (φ_{t}^{- 1} (y)) = 0 .

\nabla (\frac{1}{2} d^{2} (\cdot, y)) (x) = - t \nabla f (x)

\nabla (\frac{1}{2} d^{2} (\cdot, y)) (x) = - t \nabla f (x)

w_{t, y} (x) = \frac{1}{2 t} d^{2} (x, y) + f (x) .

w_{t, y} (x) = \frac{1}{2 t} d^{2} (x, y) + f (x) .

\frac{1}{t} d^{2} (x, x^{'}) ≲ f (x) - f (x^{'}) + \frac{1}{2 t} (d^{2} (x, y^{'}) - d^{2} (x^{'}, y^{'}))

\frac{1}{t} d^{2} (x, x^{'}) ≲ f (x) - f (x^{'}) + \frac{1}{2 t} (d^{2} (x, y^{'}) - d^{2} (x^{'}, y^{'}))

\frac{1}{t} d^{2} (x, x^{'}) ≲ w_{t, y^{'}} (x) - w_{t, y^{'}} (x^{'}) .

\frac{1}{t} d^{2} (x, x^{'}) ≲ w_{t, y^{'}} (x) - w_{t, y^{'}} (x^{'}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the optimal map in the $2$ -dimensional random matching problem

L. Ambrosio

Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy

[email protected]

,

F. Glaudo

ETH, Rämistrasse 101, 8092 Zürich, Switzerland

[email protected]

and

D. Trevisan

Dipartimento di matematica, Università di Pisa, Largo Bruno Pontecorvo 56127, Pisa

[email protected]

Abstract.

We show that, on a $2$ -dimensional compact manifold, the optimal transport map in the semi-discrete random matching problem is well-approximated in the $L^{2}$ -norm by identity plus the gradient of the solution to the Poisson problem $-\Delta f^{n,t}=\mu^{n,t}-1$ , where $\mu^{n,t}$ is an appropriate regularization of the empirical measure associated to the random points. This shows that the ansatz of [Car+14] is strong enough to capture the behavior of the optimal map in addition to the value of the optimal matching cost.

As part of our strategy, we prove a new stability result for the optimal transport map on a compact manifold.

1. Introduction

Let $(X_{1},X_{2},\dots,X_{n})$ be $n$ independent random points uniformly distributed on the square $\left[0,\,1\right]^{2}$ . The semi-discrete random matching problem concerns the study of the properties of the optimal coupling (with respect to a certain cost) of these $n$ points with the Lebesgue measure $\mathscr{L}^{2}\raisebox{-1.29167pt}{$ | $}_{\left[0,\,1\right]^{2}}$ .

More precisely, denoting $\mu^{n}\coloneqq\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}$ the empirical measure and $\mathrm{m}\coloneqq\mathscr{L}^{2}\raisebox{-1.29167pt}{$ | $}_{\left[0,\,1\right]^{2}}$ , we want to investigate the optimal transport from $\mathrm{m}$ to $\mu^{n}$ .

The ultimate goal is understanding both the distribution of the random variable associated to the optimal transport cost and the properties of the (random) optimal map. In the present paper we will show that the optimal transport map can be well-approximated by the identity plus the gradient of the solution of a Poisson problem. In the large literature devoted to the matching problem, we believe that (except for the 1-dimensional case) this is one of the few results describing the behavior of the optimal map, and not only of the transport cost, see also [GHO18] in connection with the behavior of the optimal transport map in the Lebesgue-to-Poisson problem on large scales.

Before going on, let us briefly recall the definitions of optimal transport and Wasserstein distance. We suggest the monographs [Vil08, San15] for an introduction to the topic.

Definition 1.1 (Wasserstein distance).

Let $(X,d)$ be a compact metric space and let $\mu,\nu\in\mathcal{P}(X)$ be probability measures. Given $p\in\left[1,\,\infty\right)$ , we define the $p$ -Wasserstein distance between $\mu$ and $\nu$ as

[TABLE]

where $\Gamma(\mu,\nu)$ is the set of all $\gamma\in\mathcal{P}(X\times X)$ such that the projections $\pi_{i}$ , $i=1,2$ , on the two factors are $\mu$ and $\nu$ , that is $(\pi_{1})_{\#}\gamma=\mu$ and $(\pi_{2})_{\#}\gamma=\nu$ .

*Remark 1.2**.*

The infimum in the previous definition is always attained ([San15, Theorem 1.4]).

Moreover, if $(X,d)$ is a Riemannian manifold and $\mu\ll\mathrm{m}$ , where $\mathrm{m}$ is the volume measure of the manifold, the Wasserstein distance is realized by a map ([McC01]). Namely, the infimum 1.1 is attained and the unique minimizer is induced by a Borel map $T:M\to M$ , so that $T_{\#}\mu=\nu$ and

[TABLE]

Even though the square is a fundamental example, the random matching problem makes perfect sense even in more general spaces (changing the reference measure $\mathrm{m}$ accordingly). Historically, in the combinatorial literature111In the combinatorial literature the problem considered was the bipartite matching problem, in which two independent random point clouds have to be matched. The semi-discrete matching and the bipartite matching are tightly linked and, given that we will consider only the former, we are going to talk about the combinatorial literature as if it were considering the semi-discrete matching., the most common ambient space was $\left[0,\,1\right]^{d}$ for some $d\geq 1$ and the aspect of the problem that attracted more attention was estimating the expected value of the $W_{1}$ cost. In the papers [AKT84, Tal92, DY95, Led17] (and possibly in other ones) the problem was solved in all dimensions and for all $1\leq p<\infty$ , obtaining the growth estimates222The notation $f(n)\approx g(n)$ means that there exists a positive constant $C>0$ such that $C^{-1}g(n)\leq f(n)\leq Cg(n)$ for every $n$ .

[TABLE]

As might be clear from the presence of a logarithm, the matching problem exhibits some unexpected behavior in dimension $2$ .

See the introductions of [AG18, Led17] or [Tal14, Chapter 4, 14, 15] for a more in-depth description of the history of the problem.

Nowadays the topic is active again ([HPZ18, Tal18, GHO18, Led17, AG18, Led18, Led19]), also as a consequence of [AST18], in which the authors, following an ansatz suggested in [Car+14], manage to obtain the leading term of the asymptotic expansion of the expected matching cost in dimension $2$ with respect to the quadratic distance333The notation $f(n)\sim g(n)$ means that $\frac{f(n)}{g(n)}\to 1$ when $n\to\infty$ .:

[TABLE]

The approach is far from being combinatorial, indeed it relies on a first-order approximation of the Wasserstein distance with the $H^{-1}$ negative Sobolev norm. Their proof works on any closed compact $2$ -dimensional manifold.

Given that we will build upon it, let us give a brief sketch of the approach. What we are going to describe is simpler than the original approach of [AST18] and can be found in full details in [AG18]. For simplicity we will assume to work on the square.

Let $T^{n}$ be the optimal map from $\mathrm{m}$ to $\mu^{n}$ , whose existence is ensured by Brenier’s Theorem (see [Bre91]). Still by Brenier’s Theorem, we know that $T^{n}=\mathds{1}+\nabla\tilde{f}^{n}$ , where $\mathds{1}$ is the identity map and $\tilde{f}^{n}:\left[0,\,1\right]^{2}\to M$ is a convex function. With high probability $\mu^{n}$ is well-spread on the square, thus we expect $\nabla\tilde{f}^{n}$ to be very small. We know $(T^{n})_{\#}\mathrm{m}=\mu^{n}$ and we would like to apply the change of variable formula to deduce something on the Hessian of $\tilde{f}^{n}$ . The issue is that the singularity of $\mu^{n}$ prevents a direct application of the change of variable formula. Anyhow, proceeding formally we obtain $\det(\mathds{1}+\nabla^{2}\tilde{f}^{n})^{-1}=\mu^{n}$ . Going on with the formal computation, if we consider only the first order term of the left hand side, the previous identity simplifies to

[TABLE]

Somewhat unexpectedly, this last equation makes perfect sense. Therefore we might claim that if we define $f^{n}:\left[0,\,1\right]^{2}\to\mathbb{R}$ as the solution of $-\Delta f^{n}=\mu^{n}-1$ (with null Neumann boundary condition), then $T^{n}$ is well-approximated by $\mathds{1}+\nabla f^{n}$ and furthermore the transport cost is well-approximated by $\int{\lvert\nabla f^{n}\rvert}^{2}\,\mathrm{d}\mathrm{m}$ .

This conjecture is appealing, but false, if taken literally. Indeed, it is very easy to check that the integral $\int{\lvert\nabla f^{n}\rvert}^{2}\,\mathrm{d}\mathrm{m}$ diverges.

The ingredient that fixes this issue is a regularization argument. More precisely, let $\mu^{n,t}\coloneqq P_{t}^{*}\mu^{n}$ be the evolution at a certain small time $t>0$ of the empirical measure through the heat semigroup (see [Cha84, Chapter 6]). If we repeat the ansatz with $\mu^{n}$ replaced by $\mu^{n,t}$ we obtain a function $f^{n,t}:\left[0,\,1\right]^{2}\to\mathbb{R}$ that solves

[TABLE]

with null Neumann boundary conditions. Let us remark that in fact $f^{n,t}=P_{t}f^{n}$ .

Once again, we can hope that $\mathds{1}+\nabla f^{n,t}$ approximates very well $T^{n}$ and furthermore that the transport cost from $\mathrm{m}$ to $\mu^{n}$ is well-approximated by $\int{\lvert\nabla f^{n,t}\rvert}^{2}\,\mathrm{d}\mathrm{m}$ .

This time the predictions are sound. Choosing carefully the time $t=t(n)$ , we can show that, with high probability, the map $\mathds{1}+\nabla f^{n,t}$ is optimal from $\mathrm{m}$ to $\left(\mathds{1}+\nabla f^{n,t}\right)_{\#}\mathrm{m}$ and the Dirichlet energy of $f^{n,t}$ approximate very well $W_{2}^{2}(\mathrm{m},\mu^{n})$ . Only one part of the conjecture is left unproven by [AG18]: is it true that $\mathds{1}+\nabla f^{n,t}$ approximates, in some adequate sense, the optimal map $T^{n}$ ? The goal of the present paper is to answer positively this question.

We are going to prove the following.

Theorem 1.3.

Let $(M,g)$ be a $2$ -dimensional closed compact Riemannian manifold (or the square $\left[0,\,1\right]^{2}$ ) whose volume measure $\mathrm{m}$ is a probability. We will denote with $d:M\times M\to\left[0,\,\infty\right)$ the Riemannian distance on $M$ .

Given $n\in\mathbb{N}$ , let $X_{1},X_{2},\dots,X_{n}$ be $n$ independent random points $\mathrm{m}$ -uniformly distributed on $M$ . Let us denote $\mu^{n}\coloneqq\frac{1}{n}\sum_{i}\delta_{X_{i}}$ the empirical measure associated to the random point cloud and let $T^{n}$ be the optimal transport map from $\mathrm{m}$ to $\mu^{n}$ .

For a fixed time $t>0$ , let $\mu^{n,t}\coloneqq P_{t}^{*}\mu^{n}\in\mathcal{P}(M)$ and let $f^{n,t}:M\to\mathbb{R}$ be the unique null-mean solution444If $M=\left[0,\,1\right]^{2}$ we ask also that $f$ satisfies the null Neumann boundary conditions. of the Poisson problem $-\Delta f^{n,t}=\mu^{n,t}-1$ .

If we set $t=t(n)=\frac{\log(n)^{4}}{n}$ , on average $T^{n}$ is very close to $\exp(\nabla f^{n,t})$ in the $L^{2}$ -norm, that is

[TABLE]

In particular,

[TABLE]

*Remark 1.4**.*

To handle the case of the square $M=\left[0,\,1\right]^{2}$ some care is required. Indeed the presence of boundary makes things more delicate. This is the reason why only the square is considered in the theorem and not any $2$ -dimensional compact manifold with boundary.

See [AG18, Subsection 2.1 and Remark 3.10] for some further details on this matter.

*Remark 1.5**.*

By McCann’s Theorem [McC01] we can write $T^{n}=\exp(\nabla f^{n})$ , hence a natural question is if 1.3 holds with $|\nabla(f^{n}-f^{n,t})|$ in place of $d(T^{n},\exp(\nabla f^{n,t}))$ . Using the fact that the exponential map restricted to a sufficiently small neighbourhood of the null vector field is a global diffeomorphism with its image, it would be sufficient to show that, for every $\varepsilon>0$ , $\operatorname{\mathbb{P}}\left(\|d(T^{n},\mathds{1})\|_{\infty}>\varepsilon\right)\ll\log(n)/n$ , as $n\to\infty$ . We will prove this estimate in Proposition 4.7, that provides the desired approximation at the level of the gradients

[TABLE]

The strategy of the proof is to show that the information that we already have on $\exp(\nabla f^{n,t})$ (namely that it is an optimal map between $\mathrm{m}$ and some measure $\hat{\mu}^{n,t}$ that is very close to $\mu^{n,t}$ ) is enough to deduce that it must be near to the optimal map $T^{n}$ .

As part of the strategy of proof, we obtain, in Section 3, a new stability result for the optimal transport map on a general compact Riemannian manifold (not only of dimension $2$ ). This is the natural generalization to Riemannian manifolds of [Gig11]. The said stability result follows rather easily from the study of the short-time behavior of the Hopf-Lax semigroup we perform in Section 2. The Hopf-Lax semigroup comes up in our investigation as, when $t=1$ , it becomes the operator of $c$ -conjugation and thus produce the second Kantorovich potential once the first is known (see [San15, Section 1.2] for the theory of Kantorovich potentials and $c$ -conjugation).

The main theorem is established in Section 4.

*Acknowledgments. * F. Glaudo has received funding from the European Research Council under the Grant Agreement No 721675. L. Ambrosio acknowledges the support of the MIUR PRIN 2015 project.

1.1. Notation for constants

We will use the letters $c$ and $C$ to denote constants, whose dependencies are denoted by $c=c(A,B,\dots)$ . The value of such constants can change from one time to the other.

Moreover we will frequently use the notation $A\lesssim B$ to hide a constant that depends only on the ambient manifold $M$ . This expression means that there exists a constant $C=C(M)$ such that $A\leq C\cdot B$ .

2. Short-time behavior of the Hopf-Lax semigroup with datum in $C^{1,1}$

Let us begin recalling the definition of the Hopf-Lax semigroup (also called Hamilton-Jacobi semigroup).

Definition 2.1 (Hopf-Lax semigroup).

Let $(X,d)$ be a compact length space555A metric space is a length space if the distance between any two points is the infimum of the length of the curves between the two points. Let us remark that for the definition we need neither the compactness nor the length property of $X$ , but without these assumptions many of the properties of the Hopf-Lax semigroup fail (first of all the fact that it is a semigroup).. For any function $f\in C(X)$ and any $t\geq 0$ , let $Q_{t}f:X\to\mathbb{R}$ be defined by

[TABLE]

Without additional assumptions on $X$ or $f$ it is already possible to deduce many properties of the Hopf-Lax semigroup. Let us give a very short summary of the most important ones.

•

When $t\to 0$ the functions $Q_{t}f$ converge uniformly to $f$ .

•

The Hopf-Lax semigroup is indeed a semigroup, that is $Q_{s+t}f=Q_{s}Q_{t}f$ for any $s,\,t\geq 0$ .

•

In a suitable weak sense,the Hamilton-Jacobi equation

[TABLE]

holds. Let us emphasize that the mentioned equation does not make sense if we don’t give an appropriate definition of norm of the gradient as we are working in a metric setting.

See [LV07], in particular Theorem 2.5, for a detailed proof of the mentioned properties.

There is a vast literature investigating the regularity properties of the Hopf-Lax semigroup and its connection with the Hamilton-Jacobi equation, in particular that it is the unique solution in the viscosity sense (see for instance [Lio82, Ben77, BC08]). Nonetheless we could not find a complete reference for the short-time behavior of the Hopf-Lax semigroup on a Riemannian manifold (as the majority of the results are stated on the Euclidean space) with a relatively regular initial datum (namely $C^{1,1}$ ). This is exactly the topic of this section.

What we are going to show, apart from (3), is not new. For instance, in [Fat03, Section 5], the author proves the validity of the method of characteristics in a way very similar to ours. In that paper more general Lagrangians are considered and as a consequence the proofs are more involved and require much more geometric tools and notation.

For us, the ambient space is a compact Riemannian manifold $(M,g)$ and the function $f\in C^{1,1}(M)$ is differentiable with Lipschitz continuous gradient. Moreover, either $M$ is closed or it is the square $\left[0,\,1\right]^{2}$ . For a general manifold with boundary the results are false, the square is special because its boundary is piecewise geodesic. Handling all manifolds with totally geodesic boundary would be possible, but would require some additional care. In order to simplify the exposition we decided to state the results only for the square. Throughout this section we will often use implicitly that a Lipschitz continuous function is differentiable almost everywhere (see [EG92, Theorem 3.2]).

We will show that, up to a small time that depends on the $C^{1,1}$ -norm of $f$ , the Hopf-Lax semigroup is as good as one might hope. We will describe explicitly the minimizer $x=x_{t}(y)$ of the variational problem that defines $Q_{t}f(y)$ deducing some explicit formulas for $Q_{t}f$ and its gradient and we will show that $Q_{t}f$ solves the Hamilton-Jacobi equation in the classical sense. Finally we will be able to control the $C^{1,1}$ -norm of $Q_{t}f$ and the $C^{0,1}$ -norm of $Q_{t}f-f$ .

How can we achieve these results for short times when $f\in C^{1,1}$ ? The main ingredient is the possibility to identify the minimizer $x=x_{t}(y)$ in the definition of $Q_{t}f(y)$ . Given $x\in M$ , let $\gamma:\left[0,\,\infty\right)\to M$ be the unique geodesic with $\gamma(0)=0$ and $\gamma^{\prime}(0)=\nabla f(x)$ . If $y=\gamma(t)$ , then the minimizer in the definition of $Q_{t}f(y)$ is exactly $x$ . This approach is exactly the method of characteristics when applied on a Riemannian manifold (straight lines on a manifold are geodesics).

Let us begin with a technical lemma.

Lemma 2.2.

Let $(M,g)$ be a closed compact Riemannian manifold (or the square $\left[0,\,1\right]^{2}$ ).

There exists a constant $c=c(M)$ such that the following statement holds. Let $X\in\chi(M)$ be a Lipschitz continuous vector field666If $M=\left[0,\,1\right]^{2}$ we ask also that $X$ is tangent to the boundary. with ${\lVert X\rVert}_{\infty}\leq c$ and ${\lVert\nabla X\rVert}_{\infty}\leq c$ and, for any $0\leq t\leq 1$ , let $\varphi_{t}:M\to M$ be the map defined as $\varphi_{t}(x)\coloneqq\exp(tX(x))$ , where $\exp:TM\to M$ denotes the exponential map. For any $0\leq t\leq 1$ , the map $\varphi_{t}$ is a homeomorphism such that $\operatorname{Lip}(\varphi_{t})$ , $\operatorname{Lip}(\varphi_{t}^{-1})\leq 2$ and the vector field $X_{t}\in\chi(M)$ defined as

[TABLE]

is Lipschitz continuous with ${\lVert\nabla X_{t}\rVert}_{\infty}\lesssim{\lVert\nabla X\rVert}_{\infty}$ .

Proof.

We will give only a sketch of the proof of the first part of the statement as the argument is well-known.

Let us begin by proving the result when $M$ is closed (in particular we exclude only $M=\left[0,\,1\right]^{2}$ ).

We can deduce the first part of the statement from the fact that $\varphi=\varphi_{1}$ is injective and locally (i.e. on sufficiently small balls) it is a bi-Lipschitz transformation with its image.

Working in a suitably chosen finite atlas (whose existence follows from the compactness of $M$ ), the fact that $\varphi$ is a bi-Lipschitz diffeomorphism is a consequence of the following very well-known lemma about perturbations of the identity (see [Rud+76, Theorem 9.24] or [Fat03, Theorem 5.3]). If $T:\Omega\subseteq\mathbb{R}^{d}\to\mathbb{R}^{d}$ is such that $T-\mathds{1}$ is $L$ -Lipschitz with $L<1$ , then $T$ is locally invertible and $\operatorname{Lip}(T)\leq 1+L,\ \operatorname{Lip}(T^{-1})\leq(1-L)^{-1}$ .

The global injectivity follows directly from the fact that it is locally bi-Lipschitz. Indeed if $\varphi(x_{1})=\varphi(x_{2})$ then $d(x_{1},x_{2})\leq 2{\lVert X\rVert}_{\infty}$ and therefore we can exploit the local injectivity of $\varphi$ .

When $M=\left[0,\,1\right]^{2}$ we need only a simple additional remark. Given that $X$ is tangent to the boundary, the map $\varphi$ is a homeomorphism of the boundary. As a consequence of this fact, it is not difficult to prove (by injectivity) that the image of the interior of the square is mapped by $\varphi$ in itself. From here on we can simply mimic the proof described above for closed manifolds and achieve the result also for the case of the square.

We move our attention to the second part of the statement. By a simple homogeneity argument, it is sufficient to prove that ${\lVert\nabla X_{t}\rVert}_{\infty}\lesssim 1$ .

Once again we work in chart. Let $\Omega\subseteq\mathbb{R}^{d}$ be the domain of the chart. As usual, $X_{t}$ can be understood as a vector field on $\Omega$ and $\varphi_{t}$ as a map from $\Omega^{\prime}\Subset\Omega$ into $\Omega$ . Choosing the chart appropriately, we can assume that the Euclidean distance is bi-Lipschitz equivalent to the distance induced by the metric $g$ .

The Lipschitz continuity of $X_{t}$ with respect to the metric $g$ is equivalent to proving that, for any $x,y\in\Omega$ , it holds

[TABLE]

where all the absolute values are with respect to the standard Euclidean norm. Since $\varphi_{t}$ is surjective, it is sufficient to prove that, for any $x,y\in\Omega^{\prime}$ , it holds

[TABLE]

Given that $\varphi_{t}^{-1}$ is Lipschitz, we already know

[TABLE]

Let $\gamma_{x}:\left[0,\,1\right]\to\Omega$ be the unique geodesic, with respect to $g$ , such that $\gamma_{x}(0)=x$ and $\gamma_{x}^{\prime}(0)=X(x)$ . Let $\gamma_{y}:\left[0,\,1\right]\to\Omega$ be defined analogously. By definition, it holds

[TABLE]

Taking into account 2.1, 2.2 and 2.3, the Lipschitz continuity of $X_{t}$ would follow from the inequality

[TABLE]

The curves $\gamma_{x},\gamma_{y}$ are geodesics, hence the vectors $(\gamma_{x},\gamma_{x}^{\prime})$ and $(\gamma_{y},\gamma_{y}^{\prime})$ solve the same autonomous ordinary differential equation with different initial data. Hence 2.4 follows from the well-known Lipschitz dependence of the solution from the initial data (see [Tes12, Theorem 2.6]) and therefore the proof is concluded. ∎

We can now state and prove the main theorem of this section. The technically demanding part of these notes is entirely enclosed in the following theorem.

Theorem 2.3.

Let $(M,g)$ be a closed compact Riemannian manifold (or the square $\left[0,\,1\right]^{2}$ ).

Let $f\in C^{1,1}(M)$ be a scalar function777If $M=\left[0,\,1\right]^{2}$ we ask also that $f$ satisfies the null Neumann boundary conditions. and, for any positive time $t>0$ , let us define the map $\varphi_{t}:M\to M$ as $\varphi_{t}(x)\coloneqq\exp(t\nabla f(x))$ .

There exists a constant $c=c(M)$ such that the following properties hold for any time $0\leq t\leq c\left({\lVert\nabla f\rVert}_{\infty}+{\lVert\nabla^{2}f\rVert}_{\infty}\right)^{-1}$ :

((1))

The map $\varphi_{t}$ is a bi-Lipschitz homeomorphism such that $\operatorname{Lip}(\varphi_{t}),\,\operatorname{Lip}(\varphi_{t}^{-1})\leq 2$ . 2. ((2))

For any $y\in M$ , it holds

[TABLE] 3. ((3))

For any $y,\,y^{\prime}\in M$ , one has the (strict-convexity-like) estimate

[TABLE] 4. ((4))

The function $Q_{t}f$ is Lipschitz continuous in time and $C^{1,1}(M)$ in space. In particular we have ${\lVert\partial_{t}Q_{t}f\rVert}_{\infty}\leq{\lVert\nabla f\rVert}_{\infty}$ and ${\lVert\nabla^{2}Q_{t}f\rVert}_{\infty}\lesssim{\lVert\nabla^{2}f\rVert}_{\infty}$ . 5. ((5))

The function $Q_{t}f$ is a classical solution of the Hamilton-Jacobi equation

[TABLE] 6. ((6))

For any $x\in M$ , if $\gamma:\left[0,\,1\right]\to M$ is the geodesic such that $\gamma(0)=x$ and $\gamma^{\prime}(0)=\nabla f(x)$ , then it holds

[TABLE] 7. ((7))

One has

[TABLE]

Proof.

Thanks to the following homogeneity, for any $t>0$ and $\lambda>0$ , of the Hopf-Lax semigroup

[TABLE]

we can assume without loss of generality that ${\lVert\nabla f\rVert}_{\infty}+{\lVert\nabla^{2}f\rVert}_{\infty}\leq c$ and prove that the statements hold up to time $1$ . Thus, we will implicitly assume that the time variable satisfies $0\leq t\leq 1$ . We will choose the value of the constant $c$ during the proof, it should be clear that all constraints we impose depend only on the manifold $M$ and not on the function $f$ .

The statement of (1) follows from Lemma 2.2.

To prove (2) we need some preliminary observations. If $c=c(M)$ is sufficiently small (so that the constraint on $f$ is sufficiently strong), thanks to the compactness of $M$ we can find a radius $r=r(M)>0$ such that:

(a)

If $p,\,q\in M$ satisfy $d(p,q)\leq r$ then

[TABLE] 2. (b)

For any $y\in M$ , to compute $Q_{t}f(y)$ it is sufficient to minimize on $B(y,r)$ :

[TABLE] 3. (c)

For any $y\in M$ it holds the inequality $d(y,\varphi_{t}^{-1}(y))\leq r$ . In particular we can assume that $\varphi_{t}^{-1}(y)$ is not in the cut-locus of $y$ . 4. (d)

For any $y\in M$ it holds the identity

[TABLE]

This identity can be shown computing the gradient of the distance from $y$ squared, since we know that $y=\exp(t\nabla f(x))$ where $x=\varphi_{t}^{-1}(y)$ . Indeed, given that $x$ does not belong to the cut-locus of $y$ , we know

[TABLE]

and the desired identity follows.

With these observations at our disposal, the proof of (2) is straight-forward. Given a time $0\leq t\leq 1$ and a point $y\in M$ , let us consider the function $w_{t,y}:M\to\mathbb{R}$ defined as

[TABLE]

We know that $Q_{t}f(y)=\min_{x\in B(y,r)}w_{t,y}(x)$ . Moreover $\nabla w_{t,y}(\varphi_{t}^{-1}(y))=0$ and, if the constraint on ${\lVert\nabla^{2}f\rVert}_{\infty}$ is sufficiently small, we also know $\nabla^{2}w_{t,y}\geq\frac{1}{3t}g$ in $B(y,r)$ . Hence, by convexity, we deduce that $\varphi_{t}^{-1}(y)$ is the global minimum point of $w_{t,y}$ and (2) follows.

Let us now move to the proof of (3). Let $x,\,x^{\prime}\in M$ be such that $\varphi_{t}(x)=y$ and $\varphi_{t}(x^{\prime})=y^{\prime}$ . Applying (2) and recalling that $\varphi_{t}$ is a bi-Lipschitz diffeomorphism, we can see that the inequality we want to prove is equivalent to

[TABLE]

and, using the same notation as above, this becomes

[TABLE]

The latter inequality follows from the strict convexity of $w_{t,y^{\prime}}$ that we have already shown while proving (2).

Showing from scratch that $Q_{t}f$ solves the Hamilton-Jacobi equation would not be hard, but for this we refer to [LV07, Theorem 2.5, viii], where the authors show that $Q_{t}f$ is a suitably weak solution of the Hamilton-Jacobi equation. From their statement, we can deduce that if $Q_{t}f$ is differentiable at $x\in M$ , then

[TABLE]

Since we will show that $Q_{t}f$ is $C^{1,1}(M)$ , the validity of (4) and (5) is a consequence of 2.5.

The first part of (6), namely $Q_{t}f(\gamma(t))=f(x)+\frac{t}{2}{\lvert\nabla f\rvert}^{2}(x)$ , is implied by (2). To obtain the identity involving the gradient, let us differentiate the previous equality with respect to the time variable. If $Q_{t}f$ is differentiable at $\gamma(t)$ , it holds

[TABLE]

Applying (5) and the fact that ${\lvert\gamma^{\prime}(t)\rvert}={\lvert\nabla f\rvert}(x)$ , from 2.6 we can deduce

[TABLE]

This does not imply directly (6) since we have shown the identity only if $Q_{t}f$ is differentiable at $\gamma(t)$ . As a byproduct of (2), we know that $Q_{t}f$ is Lipschitz continuous and therefore, from 2.7, we can deduce that, fixed $t$ , for almost every $x\in M$ it holds

[TABLE]

Since the right-hand side is Lipschitz continuous (see Lemma 2.2) it follows that $Q_{t}f\in C^{1,1}(M)$ and, as anticipated, this concludes the proofs of (4),(5) and (6).

Finally let us tackle (7). Given $y\in M$ , let $x=\varphi_{t}^{-1}(y)$ . Thanks to (6), if we consider the geodesic $\gamma:\left[0,\,1\right]\to M$ such that $\gamma(0)=x$ and $\gamma^{\prime}(0)=\nabla f(x)$ , we know that $\gamma(t)=y$ and $\gamma^{\prime}(t)=\nabla Q_{t}f(y)$ .

Thus we have

[TABLE]

and this is the desired statement. ∎

*Remark 2.4**.*

Let us emphasize that the only statement contained in Theorem 2.3 that we are going to use is (3). Indeed it will be crucial when studying the stability of optimal maps. Furthermore, such a statement should be seen more like as a property of the $c$ -conjugate (see [San15, Section 1.2]) than as a property of the Hopf-Lax semigroup.

We have proven all other statements in order to give a complete reference on the short-time behavior of the Hopf-Lax semigroup when the initial datum is in $C^{1,1}(M)$ .

3. Quantitative Stability of the Optimal Map

In this section we will always refer to the optimal transport with respect to the quadratic cost between two probability measures in $\mathcal{P}(M)$ that are absolutely continuous with respect to the volume measure $\mathrm{m}$ of a compact Riemannian manifold $(M,g)$ .

The duality theory of optimal transport can be seen as a tool to bound from above and from below the optimal transport cost. Indeed, simply producing a transport map we can bound the cost from above, whereas with a pair of potentials we can bound it from below. Estimating the optimal cost is the best one can desire for a generic convex problem, but for the optimal transport problem we know that the optimal map is unique (see [McC01]) and thence we would like to be able to approximate it.

In details, we want to investigate the following problem.

Problem 3.1.

Let $\nu,\,\mu_{1},\,\mu_{2}\in\mathcal{P}(M)$ be probability measures with $\nu\ll\mathrm{m}$ . Let $S,\,T$ be the optimal transport maps from $\nu$ to $\mu_{1}$ and $\mu_{2}$ respectively. Estimate the $L^{2}(\nu)$ -distance ${\lVert d(S,T)\rVert}^{2}_{L^{2}(\nu)}$ between the two maps.

The approach we are going to adopt builds upon the method, suggested to N.Gigli by the first author, who used it in [Gig11, Proposition 3.3 and Corollary 3.4]. In the proof of the mentioned results, the author obtains (even if not stated in this way) exactly the same inequality we are going to obtain. The substantial difference is that those results (and their proofs) work only when the ambient is the Euclidean space.

Transporting the proofs from the flat to the curved setting is not straight-forward. The proof of Proposition 3.3 of the mentioned paper does not work on a Riemannian manifold, because curvature comes into play when comparing tangent vectors at different points. To overcome this difficulty we have come up with (3) of Theorem 2.3. On the contrary, the proof of Corollary 3.4 is easily adapted on a compact Riemannian manifold.

Let us also mention the recent result [Ber18, Theorem 4.1]. In the said theorem the author obtain a quantitative stability of the optimal map when, instead of changing the target measure as we are doing, the source measure is changed. The proof is totally different from ours and is mainly based on complex analytic tools. Also in that paper only the Euclidean setting (and the flat torus) is considered.

We will attack the stability problem only in the perturbative setting, namely when the optimal map from $\nu$ to $\mu_{1}$ is the identity up to the first order. Working only in the perturbative setting might look like an extremely strong assumption that would yield no applications at all. This is not the case, indeed what we call perturbative setting is more or less equivalent to requiring only that the optimal transport map $T$ is local (meaning that $T-\mathds{1}$ is uniformly small) and well-behaved. For example, and this is the whole point of [AG18], the optimal map from the reference measure to a random point cloud is (with high probability) a perturbation of the identity.

We don’t need any hypothesis on the optimal map between $\nu$ and $\mu_{2}$ .

Theorem 3.2.

Let $(M,g)$ be a closed compact Riemannian manifold (or the square $\left[0,\,1\right]^{2}$ ) and let us denote by $\mathrm{m}$ its volume measure.

Let $\nu,\mu_{1},\mu_{2}\in\mathcal{P}(M)$ be three probability measures with $\nu\ll\mathrm{m}$ and let $S,T:M\to M$ be the optimal transport maps respectively for the pairs of measures $(\nu,\mu_{1})$ and $(\nu,\mu_{2})$ . We assume that $S=\exp(\nabla f)$ where $f:M\to\mathbb{R}$ is a $C^{1,1}$ -function888If $M=\left[0,\,1\right]^{2}$ we ask also that $f$ satisfies the null Neumann boundary conditions. such that ${\lVert\nabla f\rVert}_{\infty}+{\lVert\nabla^{2}f\rVert}_{\infty}\leq c$ where $c=c(M)$ is the constant considered in the statement of Theorem 2.3.

Then it holds

[TABLE]

Proof.

Let us consider a generic transport map $S^{\prime}:M\to M$ from $\nu$ to $\mu_{1}$ and recall that, according to [Gla19], if $c(M)$ is small enough, then the map $S$ is optimal.

Given $x\in M$ , let us apply (3) of Theorem 2.3 with $y=S(x)$ and $y^{\prime}=S^{\prime}(x)$ and $t=1$

[TABLE]

Integrating this inequality with respect to $\nu$ we obtain

[TABLE]

as the first two terms cancel thanks to the fact that both $S$ and $S^{\prime}$ sends $\nu$ into $\mu_{1}$ .

We can now prove the main statement under the additional assumption that there exists an optimal map $R:M\to M$ from $\mu_{2}$ to $\mu_{1}$ . Applying 3.1 with $S^{\prime}=R\circ T$ we get

[TABLE]

Thanks to the triangle inequality, it holds

[TABLE]

Applying this last inequality into 3.2 yields

[TABLE]

and the desired statement follows from the triangle inequality

[TABLE]

It remains to drop the assumption on the existence of the optimal map $R$ . Given that our ambient manifold is compact, we can apply the nonquantitative strong stability (see [Vil08, Corollary 5.23]). Let us take a sequence of absolutely continuous probability measures $\mu_{2}^{n}$ that weakly converges to $\mu_{2}$ . Thanks to McCann’s Theorem (see [McC01]) the optimal map $R^{n}$ from $\mu_{2}^{n}$ to $\mu_{1}$ exists and thanks to the strong stability we know that the optimal maps $T^{n}$ from $\nu$ to $\mu_{1}^{n}$ converge strongly in $L^{2}(\nu)$ to $T$ . Hence it is readily seen that the result for $\mu_{2}$ can be obtained by passing to the limit the result for $\mu_{2}^{n}$ . ∎

*Remark 3.3**.*

The first part of the proof of Theorem 3.2 might seem a bit magical. Let us describe what is happening under the hood.

The function $f$ is the Kantorovich potential of the couple $(\nu,\mu_{1})$ and hence, by standard theory in optimal transport, it must be $c$ -concave.

Our hypotheses ensure us that it is not only $c$ -concave, but even strictly $c$ -concave. Furthermore, the theory we have developed on the Hopf-Lax semigroup tells us that even the other potential $f^{c}=Q_{1}f$ is strictly $c$ -concave (this is exactly (3)).

The result follows integrating the strict $c$ -concavity inequality with respect to the measure $\nu$ .

*Remark 3.4**.*

The main use of Theorem 3.2 is the following one. Assume that the optimal map from $\nu$ to $\mu_{1}$ is local and well-behaved (this ensures the validity of the hypotheses of the theorem) and furthermore that $\mu_{2}$ is much closer to $\mu_{1}$ than to $\nu$ . In this situation, the theorem tells us

[TABLE]

and this conveys exactly the information that $S$ approximates very well $T$ . Notice also that the improvement from $C^{0,1/2}$ dependence of [Gig11] to the kind of Lipschitz dependence is due to the fact that we are working in a perturbative regime, close to the reference measure.

4. Optimal map in the random matching problem

We want to apply our result on the stability of the optimal map in the perturbative setting to the semi-discrete random matching problem. In this section we will work on a compact closed Riemannian manifold $(M,g)$ of dimension $2$ (or the square $\left[0,\,1\right]^{2}$ ). We will denote with $\mathrm{m}$ the volume measure, with the implicit assumption that it is a probability.

In this setting, the semi-discrete random matching problem can be formulated as follows. For a fixed $n\in\mathbb{N}$ , consider $n$ independent random points $X_{1},X_{2},\dots,X_{n}$ $\mathrm{m}$ -uniformly distributed on $M$ . Study the optimal transport map $T^{n}$ (with respect to the quadratic cost) from $\mathrm{m}$ to the empirical measure $\mu^{n}=\frac{1}{n}\sum_{i}\delta_{X_{i}}$ .

Since we want to attack the problem applying Theorem 3.2, first of all we have to choose $\nu,\,\mu_{1}$ and $\mu_{2}$ . The choices of $\nu$ and $\mu_{2}$ are very natural, indeed we set $\nu=\mathrm{m}$ and $\mu_{2}=\mu^{n}$ . This way the map $T$ is $T^{n}$ .

Far less obvious is the choice of $\mu_{1}$ , $S$ and $f$ . As one might expect from the statement of Theorem 1.3 and from the ansatz described in the introduction, our choice is $f=f^{n,t}$ . Thus $S=\exp(\nabla f^{n,t})$ (for some appropriate $t=t(n)$ ). Furthermore, keeping the same notation of [AG18], the measure $\mu_{1}=S_{\#}\mathrm{m}$ will be denoted by $\hat{\mu}^{n,t}$ .

First of all it is crucial to understand whether we are in position to apply Theorem 3.2. Indeed we need to check if $\nabla^{2}f^{n,t}$ and $\nabla f^{n,t}$ are sufficiently small. Moreover we have to obtain a strong estimate on $W_{2}^{2}(\mu_{1},\mu_{2})$ . Both this facts are among the main results obtained in [AG18]. Hence let us state them in the following proposition.

Proposition 4.1 (Summary of results from [AG18]).

Let $(M,g)$ be a closed compact $2$ -dimensional Riemannian manifold (or the square $\left[0,\,1\right]^{2}$ ) whose volume measure $\mathrm{m}$ is a probability. Given $n\in\mathbb{N}$ , let $X_{1},\dots,X_{n}$ be $n$ independent random points $\mathrm{m}$ -uniformly distributed on $M$ and denote $\mu^{n}=\frac{1}{n}\sum_{i}\delta_{X_{i}}$ the associated empirical measure.

For a choice of the time $t>0$ , let $\mu^{n,t}=P^{*}_{t}(\mu^{n})$ be the evolution through the heat flow of the empirical measure and let $f^{n,t}:M\to\mathbb{R}$ be the unique null-mean solution999If $M=\left[0,\,1\right]^{2}$ we ask also that $f$ satisfies the null Neumann boundary conditions. to the Poisson equation $-\Delta f^{n,t}=\mu^{n,t}-1$ . Finally, let us define the probability measure $\hat{\mu}^{n,t}$ as the push-forward of $\mathrm{m}$ through the map $\exp(\nabla f^{n,t})$ .

For any $\xi>0$ , let $A^{n,t}_{\xi}$ be the probabilistic event $\{{\lVert\nabla^{2}f^{n,t}\rVert}_{\infty}<\xi\}$ .

If $t=t(n)=\frac{\log^{4}(n)}{n}$ and $\xi=\xi(n)=\frac{1}{\log(n)}$ , the following statements101010In [AG18] the time $t(n)$ is chosen as $t(n)=\gamma\frac{\log^{3}(n)}{n}$ , where $\gamma$ is a constant. As we clarify in Remark 4.3, the choice of the exponent of the logarithm in the definition of $t(n)$ is not rigid. We choose the exponent $4$ instead of $3$ since it lets us get some estimates in a cleaner form and makes it possible to avoid inserting a constant in the definition of $t(n)$ . hold

•

We know the asymptotic behavior of the expected matching cost

[TABLE]

•

The probability of the complement of $A^{n,t}_{\xi}$ decays faster than any power. In formulas, for any $k>0$ there exists a constant $C=C(M,k)$ such that

[TABLE]

•

One has the refined contractivity estimate111111This does not follow from the well-known contractivity property for the heat semigroup. Indeed the standard contractivity would yield an estimate of order $t=\gamma\frac{\log^{4}(n)}{n}\gg\frac{\log(n)}{n}$ and such magnitude is too large for our purposes.**

[TABLE]

•

We are able to control the perturbation error with

[TABLE]

•

When $n$ is sufficiently large, in the event $A^{n,t}_{\xi}$ the map $\exp(\nabla f^{n,t})$ is optimal from $\mathrm{m}$ to $\hat{\mu}^{n,t}$ .

Proof.

All of these results are contained in [AG18] and thus we will only give a precise reference for them. All references are to propositions contained in [AG18].

The validity of 4.1 is contained in Theorem 1.2. The fact that the event $A^{n,t}_{\xi}$ has overwhelming probability follows from Theorem 3.3. The refined contractivity estimate 4.3 is Theorem 5.2.

The estimate 4.4 follows from Equation 6.2 and Lemma 3.14. More specifically Equation 6.2 tells us that in the event $A^{n,t}_{\xi}$ it holds

[TABLE]

and Lemma 3.14 gives us the expected value of the Dirichlet energy of $f^{n,t}$ . The behavior in the complementary of $A^{n,t}_{\xi}$ can be ignored thanks to 4.2.

It remains to show that in the event $A^{n,t}_{\xi}$ , the map $\exp(\nabla f^{n,t})$ is optimal. This follows directly from [Gla19, Theorem 1.1]. ∎

*Remark 4.2**.*

Let us repeat the elementary observation made in [AG18, Remark 5.3], as it will be useful.

Let $X,Y$ be two random variables such that, in an event $E$ , it holds $X\leq Y$ . Then

[TABLE]

In particular, if the infinity norm of $X,Y$ is suitably controlled and the probability of $E^{\mathsf{c}}$ is exceedingly small, we can assume $\operatorname{\mathbb{E}}\left[X\right]\leq\operatorname{\mathbb{E}}\left[Y\right]$ up to a small error.

This observation allows us to restrict our study to the good event $A^{n,t}_{\xi}$ . Indeed all quantities involved in our computations have at most polynomial growth, whereas $\operatorname{\mathbb{P}}\left((A^{n,t}_{\xi})^{\mathsf{c}}\right)$ decays faster than any power.

Once we have these results in our hands, the proof of the main theorem follows rather easily. Indeed we just have to check that all assumptions of our stability result are satisfied.

Proof of Theorem 1.3.

Let us assume to be in the event $A^{n,t}_{\xi}$ with $\xi=\frac{1}{\log(n)}$ . Hence, thanks to • ‣ Proposition 4.1, we can apply Theorem 3.2 to the triple of measures $\nu=\mathrm{m}$ , $\mu_{1}=\hat{\mu}^{n,t}$ and $\mu_{2}=\mu^{n}$ (with $S=\exp(\nabla f^{n,t})$ and $T=T^{n}$ ). We obtain

[TABLE]

Recalling Remark 4.2 and 4.2, if we consider the expected value we can apply the latter inequality as if it were true unconditionally and not only in the event $A^{n,t}_{\xi}$ . Thus, taking the expected value and applying Cauchy-Schwarz’s inequality, we get

[TABLE]

The desired statement follows directly applying 4.1, 4.3 and 4.4. ∎

*Remark 4.3**.*

It might seem that our choice of the time $t=\log^{4}(n)/n$ is a little arbitrary, and indeed it is. Any time $t=t(n)$ of order $\log^{\alpha}(n)/n$ , for some $\alpha>3$ , would have worked flawlessly.

It remains to justify Remark 1.5. As already said, the desired estimate boils down to the validity of

[TABLE]

for any fixed $\varepsilon>0$ . The strategy of the proof is as follows. With Lemma 4.4 (see also [GO17, Lemma 4.1]) we reduce the hard task of controlling the $L^{\infty}$ -distance between $T^{n}$ and $\mathds{1}$ to the easier task of controlling $W_{2}^{2}(\mathrm{m},\mu^{n})$ . This latter estimate is then shown to be a consequence of 4.2.

Lemma 4.4.

Let $(M,g)$ be a $d$ -dimensional compact Riemannian manifold (possibly with Lipschitz boundary) and let $\mathrm{m}$ be the volume measure on $M$ .

If $T:M\to M$ is the optimal map with respect to the quadratic cost from $\mathrm{m}$ to $T_{\#}\mathrm{m}$ , then one has

[TABLE]

Proof.

Since the map $T$ is optimal, its graph is essentially contained in $c$ -cyclically monotone set (see [San15, Theorem 1.38]). More precisely, there exists a Borel set $C\subseteq M$ such that $\{(x,T(x)):\ x\in C\}$ is $c$ -cyclically monotone and $M\setminus C$ is $\mathrm{m}$ -negligible. We will reduce our considerations to points in $C$ in order to exploit the $c$ -cyclical monotonicity.

Let us fix a point $x_{0}\in C$ and let us define $\alpha\coloneqq\frac{1}{2}d(x_{0},T(x_{0}))$ . Let us define the point $p\in M$ as the middle point between $x_{0}$ and $T(x_{0})$ , that is $d(x_{0},p)=d(p,T(x_{0}))=\alpha$ . Let us consider a point $x\in B(p,\varepsilon\alpha)\cap C$ where $\varepsilon>0$ is a small constant that will be chosen a posteriori. Finally let us define $\beta\coloneqq d(x,T(x))$ . We want to show that $\beta$ cannot be much smaller than $\alpha$ .

Thanks to the $c$ -cyclical monotonicity of $C$ , it holds

[TABLE]

and thus, applying repeatedly the triangle inequality, we deduce

[TABLE]

If $\varepsilon$ is chosen sufficiently small (i.e. $\varepsilon=1/3$ ), the desired estimate $\alpha\lesssim\beta$ follows.

Since $x$ can be chosen arbitrarily in $B(p,\varepsilon\alpha)\cap C$ , the estimate $\alpha\lesssim\beta$ implies

[TABLE]

where we have used that a ball with radius $r$ not larger than the diameter of $M$ has measure comparable to $r^{d}$ (follows from the Ahlfors-regularity of compact Riemannian manifolds with Lipschitz boundary). This completes the proof since $x_{0}$ can be chosen arbitrarily in a set with full measure. ∎

*Remark 4.5**.*

The previous lemma holds, with the same proof, on any Ahlfors-regular metric measure space that is also a length space.

*Remark 4.6**.*

If we apply Lemma 4.4 on a $2$ -dimensional manifold with $T^{n}$ being the optimal map (with respect to the quadratic cost) from $\mathrm{m}$ to the empirical measure $\mu^{n}$ , we obtain

[TABLE]

Since we know (as a consequence of 1.2) that with high probability $W^{2}_{2}(\mathrm{m},\mu^{n})\lesssim n^{-1}\log(n)$ , we deduce that with high probability it holds

[TABLE]

This estimate does not match the asymptotic behavior of the $\infty$ -Wasserstein distance between $\mathrm{m}$ and $\mu^{n}$ . In fact, as proven in [LS89, S+91, TS15], with high probability it holds

[TABLE]

We are now ready to show 4.5 (to be precise we prove a much stronger estimate).

Proposition 4.7.

Using the same notation and definitions of the statement of Theorem 1.3, for any $\varepsilon>0$ and any $k>0$ there exists a constant $C=C(M,\varepsilon,k)$ such that

[TABLE]

Proof.

We show that for any $\varepsilon>0$ and any $k>0$ there exists a constant $C=C(M,\varepsilon,k)$ such that

[TABLE]

In fact, if we are able to prove 4.7, then the statement of the proposition follows applying Lemma 4.4 with $T=T^{n}$ (changing adequately $\varepsilon,k$ and the value of the constant $C$ ).

The triangle inequality gives us

[TABLE]

The first term can be bounded using the contractivity property of the heat semigroup, obtaining

[TABLE]

For the second term we employ the transport inequality [AG18, (4.1)] and get

[TABLE]

If we assume to be in the event $A^{n,t}_{\xi}$ (that is defined in the statement of Proposition 4.1) with $\xi=\xi(n)=\frac{1}{\log(n)}$ , we have

[TABLE]

Joining 4.8, 4.9, 4.10 and 4.11 we deduce that in the event $A^{n,t}_{\xi}$ it holds

[TABLE]

Since $t(n)\to 0$ and $\xi(n)\to 0$ as $n\to\infty$ , this implies (for $n$ sufficiently large) that in the event $A^{n,t}_{\xi}$ it holds $W_{2}(\mathrm{m},\mu^{n})\leq\varepsilon$ . Hence 4.7 is a consequence of 4.2 and this concludes of the proof. ∎

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AG 18] Luigi Ambrosio and Federico Glaudo “Finer estimates on the 2-dimensional matching problem” In ar Xiv preprint ar Xiv:1810.07002 , 2018
2[AKT 84] Miklós Ajtai, János Komlós and Gábor Tusnády “On optimal matchings” In Combinatorica 4.4 Springer, 1984, pp. 259–264
3[AST 18] Luigi Ambrosio, Federico Stra and Dario Trevisan “A PDE approach to a 2-dimensional matching problem” In Probability Theory and Related Fields Springer, 2018, pp. 1–45
4[BC 08] Martino Bardi and Italo Capuzzo-Dolcetta “Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations” Springer Science & Business Media, 2008
5[Ben 77] Stanley H. Benton “The Hamilton-Jacobi equation: a global approach” Elsevier, 1977
6[Ber 18] Robert J. Berman “Convergence rates for discretized Monge-Ampère equations and quantitative stability of Optimal Transport” In ar Xiv preprint ar Xiv:1803.00785 , 2018
7[Bre 91] Yann Brenier “Polar factorization and monotone rearrangement of vector-valued functions” In Comm. Pure Appl. Math. 44.4 , 1991, pp. 375–417
8[Car+14] Sergio Caracciolo, Carlo Lucibello, Giorgio Parisi and Gabriele Sicuro “Scaling hypothesis for the Euclidean bipartite matching problem” In Physical Review E 90.1 APS, 2014, pp. 012118

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the optimal map in the 222-dimensional random matching problem

Abstract.

1. Introduction

Definition 1.1** (Wasserstein distance).**

Remark 1.2*.*

Theorem 1.3**.**

Remark 1.4*.*

Remark 1.5*.*

1.1. Notation for constants

2. Short-time behavior of the Hopf-Lax semigroup with datum in C1,1C^{1,1}C1,1

Definition 2.1** (Hopf-Lax semigroup).**

Lemma 2.2**.**

Proof.

Theorem 2.3**.**

Proof.

Remark 2.4*.*

3. Quantitative Stability of the Optimal Map

Problem 3.1**.**

Theorem 3.2**.**

Proof.

Remark 3.3*.*

Remark 3.4*.*

4. Optimal map in the random matching problem

Proposition 4.1** (Summary of results from [AG18]).**

Proof.

Remark 4.2*.*

Proof of Theorem 1.3.

Remark 4.3*.*

Lemma 4.4**.**

Proof.

Remark 4.5*.*

Remark 4.6*.*

Proposition 4.7**.**

Proof.

On the optimal map in the $2$ -dimensional random matching problem

Definition 1.1 (Wasserstein distance).

*Remark 1.2**.*

Theorem 1.3.

*Remark 1.4**.*

*Remark 1.5**.*

2. Short-time behavior of the Hopf-Lax semigroup with datum in $C^{1,1}$

Definition 2.1 (Hopf-Lax semigroup).

Lemma 2.2.

Theorem 2.3.

*Remark 2.4**.*

Problem 3.1.

Theorem 3.2.

*Remark 3.3**.*

*Remark 3.4**.*

Proposition 4.1 (Summary of results from [AG18]).

*Remark 4.2**.*

*Remark 4.3**.*

Lemma 4.4.

*Remark 4.5**.*

*Remark 4.6**.*

Proposition 4.7.