A variational approach to regularity theory in optimal transportation

Michael Goldman (LJLL)

arXiv:1907.05627·math.AP·July 15, 2019

A variational approach to regularity theory in optimal transportation

Michael Goldman (LJLL)

PDF

Open Access

TL;DR

This paper introduces a variational method to analyze the regularity of optimal transport maps, providing quantitative insights and applications to partial regularity and structure predictions in matching problems.

Contribution

It offers a new variational approach to regularity theory in optimal transportation, including a quantitative linearization of the Monge-Ampère equation and applications to existing regularity results.

Findings

01

A quantitative linearization of the Monge-Ampère equation around the identity.

02

A variational proof of the partial regularity theorem by Figalli and Kim.

03

Validation of structure predictions in optimal transport matching problems.

Abstract

This paper describes recent results obtained in collaboration with M. Huesmann and F. Otto on the regularity of optimal transport maps. The main result is a quantitative version of the well-known fact that the linearization of the Monge-Amp{\`e}re equation around the identity is the Poisson equation. We present two applications of this result. The first one is a variational proof of the partial regularity theorem of Figalli and Kim and the second is the rigorous validation of some predictions made by Carraciolo and al. on the structure of the optimal transport maps in matching problems.

Equations116

W^{2} (μ, λ) = π_{1} = μ, π_{2} = λ in f \int_{R^{d} \times R^{d}} ∣ x - y ∣^{2} d π,

W^{2} (μ, λ) = π_{1} = μ, π_{2} = λ in f \int_{R^{d} \times R^{d}} ∣ x - y ∣^{2} d π,

det \nabla^{2} ψ = \frac{μ}{λ \circ \nabla ψ} .

det \nabla^{2} ψ = \frac{μ}{λ \circ \nabla ψ} .

Δ ψ = μ - λ .

Δ ψ = μ - λ .

W_{2}^{2} (μ, λ) = (ρ, j) in f {\int_{R^{d}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} : \partial_{t} ρ + \nabla \cdot j = 0, ρ_{0} = μ, ρ_{1} = λ} .

W_{2}^{2} (μ, λ) = (ρ, j) in f {\int_{R^{d}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} : \partial_{t} ρ + \nabla \cdot j = 0, ρ_{0} = μ, ρ_{1} = λ} .

\int_{R^{d}} ζ d ρ_{t} = \int_{R^{d} \times R^{d}} ζ ((1 - t) x + t y) d π and \int_{R^{d}} ξ \cdot d j_{t} = \int_{R^{d} \times R^{d}} ξ ((1 - t) x + t y) \cdot (y - x) d π,

\int_{R^{d}} ζ d ρ_{t} = \int_{R^{d} \times R^{d}} ζ ((1 - t) x + t y) d π and \int_{R^{d}} ξ \cdot d j_{t} = \int_{R^{d} \times R^{d}} ξ ((1 - t) x + t y) \cdot (y - x) d π,

E = \frac{1}{R ^{d + 2}} \int_{(B_{6 R} \times R^{d}) \cup (R^{d} \times B_{6 R})} ∣ x - y ∣^{2} d π

E = \frac{1}{R ^{d + 2}} \int_{(B_{6 R} \times R^{d}) \cup (R^{d} \times B_{6 R})} ∣ x - y ∣^{2} d π

D = \frac{1}{R ^{d + 2}} W_{B_{6 R}}^{2} (μ, κ_{μ}) + \frac{1}{κ _{μ}} (κ_{μ} - 1)^{2} + \frac{1}{R ^{d + 2}} W_{B_{6 R}}^{2} (λ, κ_{λ}) + \frac{1}{κ _{λ}} (κ_{λ} - 1)^{2} .

D = \frac{1}{R ^{d + 2}} W_{B_{6 R}}^{2} (μ, κ_{μ}) + \frac{1}{κ _{μ}} (κ_{μ} - 1)^{2} + \frac{1}{R ^{d + 2}} W_{B_{6 R}}^{2} (λ, κ_{λ}) + \frac{1}{κ _{λ}} (κ_{λ} - 1)^{2} .

∣ x - y ∣ ≲ R (E + D)^{\frac{1}{d + 2}} .

∣ x - y ∣ ≲ R (E + D)^{\frac{1}{d + 2}} .

ΔΦ = c in B_{R} and ν \cdot \nablaΦ = ν \cdot \overset{ˉ}{j} on \partial B_{R},

ΔΦ = c in B_{R} and ν \cdot \nablaΦ = ν \cdot \overset{ˉ}{j} on \partial B_{R},

\int_{(B_{1} \times R^{d}) \cup (R^{d} \times B_{1})} ∣ x - y + \nablaΦ (x) ∣^{2} d π \leq τ E + C D .

\int_{(B_{1} \times R^{d}) \cup (R^{d} \times B_{1})} ∣ x - y + \nablaΦ (x) ∣^{2} d π \leq τ E + C D .

\int_{B_{2}} \int_{0}^{1} \frac{1}{ρ} ∣ j - ρ \nablaΦ ∣^{2} \leq τ E + C D .

\int_{B_{2}} \int_{0}^{1} \frac{1}{ρ} ∣ j - ρ \nablaΦ ∣^{2} \leq τ E + C D .

\int_{B_{R}} ∣\nablaΦ ∣^{2} ≲ \int_{\partial B_{R}} \overset{ˉ}{f}^{2},

\int_{B_{R}} ∣\nablaΦ ∣^{2} ≲ \int_{\partial B_{R}} \overset{ˉ}{f}^{2},

Δ ϕ = c in B_{R} and ν \cdot \nabla ϕ = \overset{g}{^} on \partial B_{R},

Δ ϕ = c in B_{R} and ν \cdot \nabla ϕ = \overset{g}{^} on \partial B_{R},

\int_{\partial B_{R}} \overset{g}{^}^{2} ≲ E + D and W^{2} (\overset{ˉ}{f}_{\pm}, \overset{g}{^}_{\pm}) ≲ (E + D)^{\frac{d + 3}{d + 2}} .

\int_{\partial B_{R}} \overset{g}{^}^{2} ≲ E + D and W^{2} (\overset{ˉ}{f}_{\pm}, \overset{g}{^}_{\pm}) ≲ (E + D)^{\frac{d + 3}{d + 2}} .

\int_{\partial B_{R}} \int_{0}^{1} f^{2} ≲ E + D .

\int_{\partial B_{R}} \int_{0}^{1} f^{2} ≲ E + D .

\int_{B_{2}} \int_{0}^{1} \frac{1}{ρ} ∣ j - ρ \nablaΦ ∣^{2} \leq (\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2}) + τ E + C D .

\int_{B_{2}} \int_{0}^{1} \frac{1}{ρ} ∣ j - ρ \nablaΦ ∣^{2} \leq (\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2}) + τ E + C D .

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j - ρ \nablaΦ ∣^{2}

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j - ρ \nablaΦ ∣^{2}

+ 2 \int_{B_{R}} \int_{0}^{1} (\nablaΦ - j) \cdot \nablaΦ + \int_{B_{R}} (\overset{ρ}{ˉ} - 1) ∣\nablaΦ ∣^{2} .

\int_{B_{R}} \int_{0}^{1} (\nablaΦ - j) \cdot \nablaΦ

\int_{B_{R}} \int_{0}^{1} (\nablaΦ - j) \cdot \nablaΦ

= - \int_{B_{R}} \int_{0}^{1} \partial_{t} ρ Φ

= \int_{B_{R}} Φ d (μ - 1)

\int_{B_{R}} Φ d (μ - 1) ≲ (\int_{B_{R}} ∣\nablaΦ ∣^{2})^{1/2} W_{B_{R}} (μ, 1) ≲ \eqref e l l i pt i c & \eqref b o u n df (E + D)^{1/2} D^{1/2} \leq Young τ E + \frac{C}{τ} D .

\int_{B_{R}} Φ d (μ - 1) ≲ (\int_{B_{R}} ∣\nablaΦ ∣^{2})^{1/2} W_{B_{R}} (μ, 1) ≲ \eqref e l l i pt i c & \eqref b o u n df (E + D)^{1/2} D^{1/2} \leq Young τ E + \frac{C}{τ} D .

W_{B_{R}}^{2} (\overset{ρ}{ˉ}, 1) ≲ E + D,

W_{B_{R}}^{2} (\overset{ρ}{ˉ}, 1) ≲ E + D,

⎩ ⎨ ⎧ \partial_{t} ρ + \nabla \cdot j = 0 ρ_{0} = μ, ρ_{1} = 1 ν \cdot j = f in B_{R} \times (0, 1) in B_{R} on \partial B_{R} \times (0, 1)

⎩ ⎨ ⎧ \partial_{t} ρ + \nabla \cdot j = 0 ρ_{0} = μ, ρ_{1} = 1 ν \cdot j = f in B_{R} \times (0, 1) in B_{R} on \partial B_{R} \times (0, 1)

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2} \leq τ E + C D .

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2} \leq τ E + C D .

ρ = {1 1 + s in B_{R - r} \times (0, 1) in B_{R} \ B_{R - r} \times (0, 1), j = {\nablaΦ \nablaΦ + q in B_{R - r} \times (0, 1) in B_{R} \ B_{R - r} \times (0, 1),

ρ = {1 1 + s in B_{R - r} \times (0, 1) in B_{R} \ B_{R - r} \times (0, 1), j = {\nablaΦ \nablaΦ + q in B_{R - r} \times (0, 1) in B_{R} \ B_{R - r} \times (0, 1),

\int_{B_{R} \ B_{R - r}} \int_{0}^{1} ∣ q ∣^{2} ≲ r \int_{\partial B_{R}} \int_{0}^{1} (f - \overset{ˉ}{f})^{2}

\int_{B_{R} \ B_{R - r}} \int_{0}^{1} ∣ q ∣^{2} ≲ r \int_{\partial B_{R}} \int_{0}^{1} (f - \overset{ˉ}{f})^{2}

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2}

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2}

≲ \int_{B_{R} \ B_{R - r}} ∣\nablaΦ ∣^{2} + \int_{B_{R} \ B_{R - r}} \int_{0}^{1} ∣ q ∣^{2}

≲ \eqref e n er g s q r \int_{\partial B_{R}} \overset{ˉ}{f}^{2} + r \int_{\partial B_{R}} \int_{0}^{1} (f - \overset{ˉ}{f})^{2} ≲ r \int_{\partial B_{R}} f^{2},

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2} ≲ (\int_{\partial B_{R}} \int_{0}^{1} f^{2})^{\frac{d + 2}{d + 1}} ≲ \eqref b o u n df (E + D)^{\frac{d + 2}{d + 1}},

\int_{B_{R}} \int_{0}^{1} \frac{1}{ρ} ∣ j ∣^{2} - \int_{B_{R}} ∣\nablaΦ ∣^{2} ≲ (\int_{\partial B_{R}} \int_{0}^{1} f^{2})^{\frac{d + 2}{d + 1}} ≲ \eqref b o u n df (E + D)^{\frac{d + 2}{d + 1}},

\frac{1}{R ^{d + 2}} \int_{B_{6 R}} ∣ T - x ∣^{2} \leq ε,

\frac{1}{R ^{d + 2}} \int_{B_{6 R}} ∣ T - x ∣^{2} \leq ε,

A, b min \frac{1}{r ^{d + 2}} \int_{B_{r}} ∣ T - (A x + b) ∣^{2} ≲ r^{2 α} \int_{B_{1}} ∣ T - x ∣^{2} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeometric Analysis and Curvature Flows · Nonlinear Partial Differential Equations · Advanced Mathematical Modeling in Engineering

Full text

A variational approach to regularity theory in optimal transportation

M. Goldman Université de Paris, CNRS, Sorbonne-Université, Laboratoire Jacques-Louis Lions (LJLL), F-75005 Paris, France, [email protected]

Abstract

This paper describes recent results obtained in collaboration with M. Huesmann and F. Otto on the regularity of optimal transport maps. The main result is a quantitative version of the well-known fact that the linearization of the Monge-Ampère equation around the identity is the Poisson equation. We present two applications of this result. The first one is a variational proof of the partial regularity theorem of Figalli and Kim and the second is the rigorous validation of some predictions made by Carraciolo and al. on the structure of the optimal transport maps in matching problems.

1 Introduction

Following Caffarelli’s groundbreaking papers [9, 8], the classical approach to regularity theory for solutions of the optimal transport problem goes through maximum principle arguments and the construction of barriers (see the review paper [13]). The aim of this note is to describe a recent alternative approach, more variational in nature and based on the fact that the linearization of the Monge-Ampère equation around the identity is the Poisson equation (see [27]). Our main achievement in this direction is an harmonic approximation result which says that if at a given scale the transport plan is close to the identity and if at the same scale both the starting and target measures are close (in the Wasserstein metric) to be constant, then on a slightly smaller scale, the transport plan is actually extremely close to an harmonic gradient field. As in De Giorgi’s approach to the regularity theory for minimal surfaces (see [24]) this allows to transfer the good regularity properties of harmonic functions to the transport plan and obtain an “excess improvement by tilting” estimate. This may be used to propagate information from the macroscopic scale down to the microscopic scale through a Campanato iteration.

We give two applications of this result. The first one is a new proof of the partial regularity result of Figalli and Kim [15] (see also [14]). The second one is a validation up to the microscopic scale of the prediction by Caracciolo and al. [11] that for the optimal matching problem between a Poisson point process and the Lebesgue measure, the optimal transport plan is well approximated by the gradient of the solution to the corresponding Poisson equation with very high probability.

The plan of this note is the following. In Section 2 we recall some standard results on optimal transportation. The harmonic approximation theorem is stated together with a sketch of proof in Section 3. We then describe the application to the partial regularity result in Section 4 and to the optimal matching problem in Section 5.

2 The optimal transport problem

Optimal transportation is nowadays a very broad and active field. We give here only a very basic and short introduction to the topic and refer the reader to the monographs [25, 27] for much more details. For $\mu$ and $\lambda$ two positive measures on $\mathbb{R}^{d}$ with $\mu(\mathbb{R}^{d})=\lambda(\mathbb{R}^{d})$ the optimal transport problem (in its Lagrangian formulation) is

[TABLE]

where for a coupling $\pi$ on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ , $\pi_{1}$ (respectively $\pi_{2}$ ) denotes the first (respectively the second) marginal of $\pi$ . Under very mild assumptions on $\mu$ and $\lambda$ (for instance compact supports), an optimal transference plan $\pi$ exists (see [27]). The optimality conditions are as follows:

Theorem 2.1.

Let $\pi$ be a coupling between $\mu$ and $\lambda$ .

(i)

(Knott-Smith) It is optimal if and only if there exists a convex and lower-semicontinuous function $\psi$ (also called the Kantorovich potential) such that $\textup{spt}\,\pi\subseteq\textrm{Graph}(\partial\psi)$ .

(ii)

(Brenier) Moreover, if $\mu$ does not give mass to Lebesgue negligible sets, then there exists a unique $\nabla\psi$ , gradient of a convex function, with $\nabla\psi\#\mu=\lambda$ and $\pi=(\textrm{id}\times\nabla\psi)\#\mu$ . In this case we let $T=\nabla\psi$ be the optimal transport map.

Let us point out that assuming that $\psi$ is regular and that both $\mu$ and $\lambda$ are smooth densities, the condition $T\#\mu=\lambda$ is nothing else than the Monge-Ampère equation

[TABLE]

In particular, if both $\mu$ and $\lambda$ are close to (the same) constant density, then the Monge-Ampère equation linearizes to the Poisson equation (see [27, Ex. 4.1])

[TABLE]

We will also use the Eulerian formulation of the optimal transport problem.

Theorem 2.2 (Benamou-Brenier).

There holds

[TABLE]

Moreover, if $\pi$ is an optimal transport plan for (2.1), then the density-flux pair $(\rho_{t},j_{t})$ defined for $t\in[0,1]$ by its action on test functions $(\zeta,\xi)\in C^{0}(\mathbb{R}^{d})\times(C^{0}(\mathbb{R}^{d}))^{d}$ as

[TABLE]

is a minimizer of (2.3).

Let us introduce some further notation. If $(\rho,j)$ is a minimizer of (2.3), we define $(\bar{\rho},\bar{j})$ the density-flux pair obtained by integrating in time (for instance $\bar{\rho}=\int_{0}^{1}\rho_{t}$ ). For $R>0$ and $\mu$ a positive measure on $\mathbb{R}^{d}$ , we denote by $W_{B_{R}}(\mu,\kappa)=W(\mu{{\mathchoice{\>\hbox{\vrule width=0.2pt\vbox to7.0pt{\vfill\hrule width=7.0pt,height=0.2pt}}\>}{\>\hbox{\vrule width=0.2pt\vbox to7.0pt{\vfill\hrule width=7.0pt,height=0.2pt}}\>}{\,\hbox{\vrule width=0.2pt\vbox to5.0pt{\vfill\hrule width=5.0pt,height=0.2pt}}\,}{\,\hbox{\vrule width=0.2pt\vbox to3.35pt{\vfill\hrule width=3.35pt,height=0.2pt}}\,}}}B_{R},\kappa\chi_{B_{R}}dx)$ , the Wasserstein distance between the restriction of $\mu$ to the ball $B_{R}$ and the corresponding constant density $\kappa=\frac{\mu(B_{R})}{|B_{R}|}$ .

In order to obtain a local version of the equivalence between (2.1) and (2.3), we will need an $L^{\infty}$ bound on the displacement (see [20, 19]).

Lemma 2.3.

Let $\pi$ be a coupling between two measures $\mu$ and $\lambda$ . Assume that $\textup{spt}\,\pi$ is monotoneiiiMeaning that for every $(x_{1},y_{1})$ and $(x_{2},y_{2})$ in $\textup{spt}\,\pi$ , $(x_{1}-x_{2})\cdot(y_{1}-y_{2})\geq 0$ . and that for someiiiiiiWe use the short-hand notation $A\ll 1$ to indicate that there exists $\varepsilon>0$ depending only on the dimension such that $A\leq\varepsilon$ . Similarly, $A\lesssim B$ means that there exists a constant $C>0$ depending on the dimension such that $A\leq CB$ . $R>0$ , $E+D\ll 1$ where

[TABLE]

and

[TABLE]

Then, for every $(x,y)\in\textup{spt}\,\pi\cap\left((B_{5R}\times\mathbb{R}^{d})\cup(\mathbb{R}^{d}\times B_{5R})\right)$

[TABLE]

3 The harmonic approximation theorem

We now state the harmonic approximation theorem. By scaling invariance, it is enough to state it at the unit scale $R=1$ . For $\mu$ , $\lambda$ two positive measures and $\pi$ an optimal coupling between them, we define the “excess” energy $E$ as in (2.5) and the distance to the data $D$ as in (2.6).

Theorem 3.1.

([19, Th. 1.4]) For every $0<\tau\ll 1$ , there exist $\varepsilon(\tau)>0$ and $C(\tau)>0$ such that provided $E+D\leq\varepsilon$ , there exists a radius $R\in(3,4)$ such that if $\Phi$ is a solution of ( $\nu$ denotes here the external normal to $\partial B_{R}$ )

[TABLE]

where $c$ is the generic constant for which this equation is solvable, then

[TABLE]

The proof of Theorem 3.1 is actually performed at the Eulerian level. Thanks to Lemma 2.3, it is indeed enough to prove:

Theorem 3.2.

For every $0<\tau\ll 1$ , there exist $\varepsilon(\tau)>0$ and $C(\tau)>0$ such that provided $E+D\leq\varepsilon$ , there exists a radius $R\in(3,4)$ such that if $\Phi$ solves (3.1), then

[TABLE]

To simplify a bit the discussion, we will assume from now on that $\lambda=\kappa_{\mu}=1$ , so that $D=W^{2}_{B_{6}}(\mu,1)$ .

The proof of Theorem 3.2 is based on three ingredients. The first of them is the choice of a ’good’ radius $R$ . Indeed, as will become apparent in the discussion below, we need a control on various quantities and this seems to be possible only for generic radii. The second ingredient is an almost orthogonality property. The last one is the construction of a competitor for (2.3).

We define the measure $f=\nu\cdot j$ on $\partial B_{R}\times(0,1)$ and then let $\bar{f}=\int_{0}^{1}f=\nu\cdot\bar{j}$ . Before discussing the almost orthogonality property and the construction, let us point out that for our estimates we would need to control the Dirichlet energy $\int_{B_{R}}|\nabla\Phi|^{2}$ by $E+D$ . Since by elliptic regularity,

[TABLE]

this is only possible if $\bar{f}$ is controlled in $L^{2}$ (or at least in $H^{-1/2}$ ). In order to solve this issue, (3.3) is first proven with $\phi$ instead of $\Phi$ where $\phi$ solves

[TABLE]

where $\hat{g}$ is a regularized version of $\bar{f}$ in the sense thatiiiiiiiiiFor a measure $\mu$ , we note $\mu_{\pm}$ its positive/negative part.

[TABLE]

The density $\hat{g}$ is obtained by projection on $\partial B_{R}$ , using the fact that for ’good’ radii, thanks to (2.7), the number of particles crossing $\partial B_{R}$ is controlled by $E+D$ . We will however forget here about this difficulty and assume that we may choose $\hat{g}=\bar{f}$ (and thus $\phi=\Phi$ ). In particular, in view of (3.5), we will assume that we have the bound

[TABLE]

We may now state the almost-orthogonality property:

Lemma 3.3.

(Orthogonality) For every $0<\tau\ll 1$ , there exist $\varepsilon(\tau)>0$ and $C(\tau)>0$ such that if $E+D\leq\varepsilon$ ,

[TABLE]

Sketch of proof.

Expanding the squares we have

[TABLE]

Let us estimate the two error terms. Using integration by parts we have (assuming without loss of generality that $\int_{B_{R}}\Phi=0$ )

[TABLE]

Forgetting higher order terms (and assuming that $\frac{\mu(B_{R})}{|B_{R}|}=1$ ), we have (recall that the Wasserstein distance is homogeneous to the $H^{-1}$ norm)

[TABLE]

Regarding the second term, in the case when $\mu=\chi_{\Omega}$ for some set $B_{6}\subseteq\Omega$ , we may argue as in [20, Lem. 3.2] and obtain that by McCann’s displacement convexity, $\bar{\rho}\leq 1$ and thus $\int_{B_{R}}(\bar{\rho}-1)|\nabla\Phi|^{2}\leq 0$ . For generic measures $\mu$ the argument is more subtle and requires a combination of elliptic estimates for (a regularized version of) $\Phi$ together with the bound

[TABLE]

which holds for ’good’ radii.

∎

As explained above, the last ingredient is the construction of a competitor:

Lemma 3.4.

For every $0<\tau\ll 1$ , there exist $\varepsilon(\tau)>0$ and $C(\tau)>0$ such that if $E+D\leq\varepsilon$ , there exists a density-flux pair $(\widetilde{\rho},\widetilde{j})$ such that

[TABLE]

and

[TABLE]

Sketch of proof.

We may assume for simplicity that also $\mu=1$ in $B_{R}$ . Indeed, otherwise we can connect in the time interval $(0,\tau)$ , the measure $\mu$ (in $B_{R}$ ) to the constant density $1$ at a cost of order $\frac{1}{\tau}W_{B_{R}}^{2}(\mu,1)=\frac{1}{\tau}D$ .

Let $0<r\ll 1$ be a small parameter to be chosen later on. We make the construction separately in the bulk $B_{R-r}\times(0,1)$ and in the boundary layer $B_{R}\backslash B_{R-r}\times(0,1)$ and set

[TABLE]

and require that $|s|\leq 1/2$ , $\partial_{t}s+\nabla\cdot q=0$ in $B_{R}\backslash B_{R-r}\times(0,1)$ , $s_{0}=s_{1}=0$ in $B_{R}$ and $\nu\cdot q=f-\bar{f}$ on $\partial B_{R}\times(0,1)$ , so that (3.9) is satisfied. The existence of an admissible pair $(s,q)$ satisfying the energy bound

[TABLE]

as long as $r\gg\left(\int_{\partial B_{R}}\int_{0}^{1}(f-\bar{f})^{2}\right)^{1/(d+1)}$ is obtained arguing by duality, in the same spirit as [2, Lem.3.3] (see [20, Lem. 2.4]).

We may now estimate

[TABLE]

where we used that by elliptic regularity, $\int_{B_{R}\backslash B_{R-r}}|\nabla\Phi|^{2}\lesssim r\int_{\partial B_{R}}\bar{f}^{2}$ . Choosing $r$ to be a large multiple of $\left(\int_{\partial B_{R}}\int_{0}^{1}(f-\bar{f})^{2}\right)^{1/(d+1)}$ yields

[TABLE]

which concludes the proof of (3.10) since $\frac{d+2}{d+1}>1$ and $E+D\ll 1$ . ∎

Proof of Theorem 3.2.

By (local) minimality of $(\rho,j)$ , we have $\int_{B_{R}}\int_{0}^{1}\frac{1}{\rho}|j|^{2}\leq\int_{B_{R}}\int_{0}^{1}\frac{1}{\widetilde{\rho}}|\widetilde{j}|^{2}$ so that combining (3.7) and (3.10) together gives the desired estimate (3.3). ∎

4 Application to partial regularity

We now turn to applications of Theorem 3.1 and start with a partial regularity result. Here we are interested in the behavior at small scales.

Let us first recall the main regularity result for optimal transport maps due to Caffarelli [8, 9].

Theorem 4.1.

If $\mu$ and $\lambda$ have compact supports, are absolutely continuous with respect to the Lebesgue measure with densities bounded from above and below on their support and if $\textup{spt}\,\lambda$ is convex, then the optimal transport map $T$ from $\mu$ to $\lambda$ is $C^{0,\alpha}$ .

The hypothesis that $\textup{spt}\,\lambda$ is convex is not merely technical. Indeed, considering for instance the optimal transport map between one ball and two disjoint balls, it is easy to construct examples where the optimal transport map is discontinuous. However, building on the ideas of Caffarelli to prove Theorem 4.1, Figalli and Kim proved in [15] that even without the convexity assumption on $\textup{spt}\,\lambda$ , the singular set of $T$ cannot be too big (see also [14] for a generalization to arbitrary non-degenerate cost functions).

Theorem 4.2.

Let $\mu$ and $\lambda$ be probability measures with compact supports, both absolutely continuous with respect to the Lebesgue measure with densities bounded from above and below on their support. Then, there exist open sets $\Omega\subseteq\textup{spt}\,\mu$ and $\Omega^{\prime}\subseteq\textup{spt}\,\lambda$ with $|\textup{spt}\,\mu\backslash\Omega|=|\textup{spt}\,\lambda\backslash\Omega^{\prime}|=0$ and such that the optimal transport map $T$ from $\mu$ to $\lambda$ is a $C^{0,\alpha}$ homeomorphism between $\Omega$ and $\Omega^{\prime}$ .

Let us point out that it is actually conjectured that the singular set is much smaller and has the same structure as the singular set of gradients of convex functions i.e. that it is $n-1$ -rectifiable (see [22] for a result in this direction).

A first application of Theorem 3.1 is a new proof of Theorem 4.2 (under the additional hypothesis that $\mu$ and $\lambda$ are Hölder continuous). For the sake of simplicity, we will assume from now on that $\mu=\chi_{\Omega_{1}}$ and $\lambda=\chi_{\Omega_{2}}$ for some bounded open sets $\Omega_{i}$ (so that in particular with the notation of Section 3, $D=0$ ). As in [14], we derive Theorem 4.2 combining Alexandrov’s Theorem (see [27]), which state that $T$ is differentiable a.e., with an $\varepsilon-$ regularity theorem.

Theorem 4.3.

([20, Th.1.2]) Let $T$ be the optimal transport map from $\Omega_{1}$ to $\Omega_{2}$ . For every $\alpha\in(0,1)$ , there exists $\varepsilon(\alpha)$ such that if $R>0$ is such that $B_{6R}\subseteq\Omega_{1}\cap\Omega_{2}$ and

[TABLE]

then $T\in C^{1,\alpha}(B_{R})$ .

By scaling invariance, we may assume that $R=1$ . As already alluded to the proof goes through a Campanato iteration. Indeed, by Campanato’s characterization of $C^{1,\alpha}$ spaces (see [10]), it is enough to prove that for every $0<r\leq\frac{1}{6}$ ,

[TABLE]

Defining

[TABLE]

this is in turn obtained by using iteratively the following proposition.

Proposition 4.4.

For every $\alpha\in(0,1)$ , there exist $\theta(\alpha)\in(0,1)$ and $\varepsilon(\alpha)$ such that if $B_{6R}\subseteq\Omega_{1}\cap\Omega_{2}$ and $E(T,R)\leq\varepsilon$ , there exist a symmetric matrix $B$ with $\det B=1$ and a vector $b$ such that letting $\hat{T}(x)=B(T(Bx)-b)$ ,

[TABLE]

and $\hat{T}$ is the optimal transport map between $\hat{\Omega}_{1}=B^{-1}\Omega_{1}$ and $\hat{\Omega}_{2}=B(\Omega_{2}-b)$ .

Sketch of proof.

By scaling we may assume that $R=1$ .

Let $\tau\ll\frac{\theta^{2\alpha}}{\theta^{d+2}}$ be fixed. Applying Theorem 3.1, we find the existence of a function $\Phi$ which is harmonic in $B_{2}$ (under our assumptions $c=0$ in (3.1)) and such that (since $D=0$ )

[TABLE]

and (recall (3.4) and (3.6))

[TABLE]

We then define $b=\nabla\Phi(0)$ and $B=\exp(-A/2)$ where $A=\nabla^{2}\Phi(0)$ . Since $\Phi$ is harmonic $\textrm{Tr}A=0$ and thus $\det B=1$ . Notice that if $T=\nabla\psi$ for some convex function $\psi$ (by Theorem 2.1), then $\hat{T}=\nabla\hat{\psi}$ with $\hat{\psi}(x)=\psi(Bx)-b\cdot x$ , which is also a convex function. Therefore $\hat{T}$ is the optimal transport map between $\hat{\Omega}_{1}$ and $\hat{\Omega}_{2}$ . We may now estimate

[TABLE]

This concludes the proof since we chose $\tau\ll\frac{\theta^{2\alpha}}{\theta^{d+2}}$ and since $E(T,1)\ll 1$ . ∎

5 Application to the optimal matching problem

We now present an application to the optimal matching problem. As opposed to the previous section, we are interested here at large scales.

Over the last thirty years, optimal matching problems have been the subject of intensive work. We refer for instance to the monograph [26]. One of the simplest example is the problem of matching the empirical measure of a Poisson point process to the corresponding Lebesgue measure. More specifically, we consider for $L\gg 1$ a Poisson point process $\mu$ on the the torus $Q_{L}=[-L/2,L/2)^{d}\simeq(\mathbb{R}/L\mathbb{Z})^{d}$ i.e.

[TABLE]

with $X_{i}$ iid random variables uniformly distributed in $Q_{L}$ and $n$ a random variable with Poisson distribution with parameter $L^{d}$ . The problem is to estimate the random variable

[TABLE]

where $W_{\textrm{per}}$ indicates the Wasserstein distance on the torus $Q_{L}$ , and to understand the structure of the corresponding optimal transport plans. It is well-known since [1] thativivivWe use the notation $\log$ for the natural logarithm.

[TABLE]

and thus $d=2$ is a critical dimension. Recently, Caracciolo and al. used the ansatz that the optimal displacement should be well approximated by $\nabla\varphi_{L}$ , where $\varphi_{L}$ solves the Poisson equation (recall (2.2))

[TABLE]

to make numerous predictions about the optimal prefactor in (5.1) as well as the correlations (see [11, 12]). At the macroscopic scale, this ansatz has been partially rigorously justified by Ambrosio and al. (see [4, 3] and also [23] for a result about the fluctuations) in dimension $2$ . To state their resultvvvThe results of [4, 3] are stated on the unit cube with a (deterministic) number of points $n\to\infty$ . However, their results may be easily transposed into our setting by scaling., let us introduce some notation. For $t>0$ , denote the heat kernel on $Q_{L}$ by $P_{t}$ and let $\varphi_{L,t}=P_{t}\ast\varphi_{L}$ , so that $\varphi_{L,t}$ solves

[TABLE]

Theorem 5.1.

Let $d=2$ , then

[TABLE]

Moreover, if $\pi_{L}$ is the optimal transport plan between $\mu$ and $\kappa$ , then setting $t_{L}=\log^{4}L$ , for $L\gg 1$ there holds

[TABLE]

Since by (5.3), the displacement $y-x$ is on average of the order of $\log^{\frac{1}{2}}L$ , (5.4) shows that $\nabla\varphi_{L,t_{L}}$ indeed coincides with the displacement to leading order. This leaves open the description of the optimal transport plan $\pi_{L}$ at the microscopic scale. To state our main result, fix a smooth cut-off function (which plays a similar role as the heat kernel in Theorem 5.1)

[TABLE]

In [17], we prove the following result (see also [19, Th. 1.2] and [18, Th. 1.1]):

Theorem 5.2.

There exists a stationary random variable $r_{*}\geq 1$ on $Q_{L}$ with exponential moments such that if $\bar{x}\in Q_{L}$ is such that $r_{*}(\bar{x})\ll L$ , then

[TABLE]

Moreover, there exists $R=R(\bar{x})\sim r_{*}(\bar{x})$ such that defining the shift $h$ by $h(\bar{x})=\frac{1}{|B_{R}|}\int_{\partial B_{R}(\bar{x})}(x-\bar{x})\nu\cdot\nabla\varphi_{L}$ , we have

[TABLE]

and

[TABLE]

With respect to (5.4), (5.7) proves that (circular) averages of $\nabla\varphi_{L}$ coincide with the displacement $y-x$ up to an error which is of order one. Moreover, (5.5) shows that after averaging, the displacement is actually extremely close to averages of $\nabla\varphi_{L}$ (notice that the error term $\log R/R$ improves as $R$ increases).

By stationarity, it is enough to prove Theorem 5.2 for $\bar{x}=0$ . The proof is based on the following deterministic result (which is a small post-processing of [19, Th. 1.2]):

Theorem 5.3.

Let $\mu$ be a measure on $Q_{L}$ . If for some $L\gg r\gg 1$ ,

[TABLE]

then

[TABLE]

Moreover, there exists $R\sim r$ such that letting $h=\frac{1}{|B_{R}|}\int_{\partial B_{R}}x\nu\cdot\nabla\varphi_{L}$ , we have

[TABLE]

Notice that (5.7) follows from (5.10) and the $L^{\infty}$ bound (2.7) of Lemma 2.3. In order to obtain Theorem 5.2, Theorem 5.3 is combined with a stochastic argument based on (5.3) and a concentration-of-measure argument which ensures that (5.8) is satisfied for the Poisson point process $\mu$ .

The main ingredient for the proof of Theorem 5.3 is a Campanato iteration scheme similar to the one leading to Theorem 4.3 (and mainly based on Theorem 3.1) which allows to transfer the information that (5.10) holds at scale $L$ by (5.8) down to the microscopic scale $r$ . This is inspired by the approach developed by Armstrong and Smart in [6] (and further refined in [16], see also [5]) for quantitative stochastic homogenization. The main ideas of [6] take roots themselves in previous works of Avellaneda and Lin (see [7]) on periodic homogenization. The outcome of the Campanato scheme may be stated as follows (see [19, Prop. 1.9])

Proposition 5.4.

There exists a sequence of approximately geometric radii $R_{k}$ i.e. $L\geq R_{0}\geq\cdots\geq R_{K}\gtrsim 1$ with $R_{k-1}\geq 2R_{k}\gtrsim R_{k-1}$ , $R_{0}\sim L$ and $R_{K}\sim r$ such that defining recursively the couplings $\pi_{k}$ by $\pi_{0}=\pi$ and

[TABLE]

where $\Phi_{k}$ solves

[TABLE]

with $j_{k}$ defined as in (2.4) with $\pi_{k}$ playing the role of $\pi$ , we have for $k\in[0,K]$ ,

[TABLE]

and

[TABLE]

Let us point out that by invariance of the Lebesgue measure under translations, $\pi_{k}$ is the optimal transport plan between $\mu$ and the Lebesgue measure for every $k$ (this is the reason why we make the translation in the target space).

Letting $\widetilde{h}=\sum_{k=0}^{K-1}\nabla\Phi_{k}(0)$ and undoing the iterative definition of $\pi_{k}$ , we see that (5.11) directly leads to (5.10) with $h$ replaced by $\widetilde{h}$ . The proof of (5.10) is concluded by the estimate (see [19, Prop. 1.10])

[TABLE]

This estimate is also crucial for the proof of (5.9). Let us point out that a naive estimate using (5.12) leads to

[TABLE]

which is suboptimal. In order to obtain a shift with the optimal estimate (5.6) it is therefore important to take into account cancellations and replace $\widetilde{h}$ by $h$ .

Let us close this note by pointing out that in dimension $d\geq 3$ , the optimal transport plans corresponding to a very closely related optimal matching problem, have been used in [21] to construct in the limit $L\to\infty$ , a stationary and locally optimal coupling between the Poisson point process on $\mathbb{R}^{d}$ and the Lebesgue measure. For $d=2$ , such a coupling is expected not to exist. However, using (5.7) and passing to the limit $L\to\infty$ , it is possible to construct (at least in the sense of Young measures) a coupling between the Poisson point process on $\mathbb{R}^{2}$ and the Lebesgue measure, which is locally optimal and has stationary increments (see [18, Th.1.2]).

Acknowledgements

This research has been partially supported by the ANR project SHAPO.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Ajtai, J. Komlós, and G. Tusnády, On optimal matchings. , Combinatorica 4 (1984), 259–264.
2[2] G. Alberti, R. Choksi, and F. Otto, Uniform energy distribution for an isoperimetric problem with long-range interactions , J. Amer. Math. Soc. 22 (2009), no. 2, 569–605.
3[3] L. Ambrosio, F. Glaudo, and D. Trevisan, On the optimal map in the 2-dimensional random matching problem , 2019, p. 20.
4[4] L. Ambrosio, F. Stra, and D. Trevisan, A PDE approach to a 2-dimensional matching problem , Probab. Theory Related Fields 173 (2019), no. 1-2, 433–477.
5[5] S. Armstrong, T. Kuusi, and J.-C. Mourrat, Quantitative stochastic homogenization and large-scale regularity , Ar Xiv e-prints (2017).
6[6] S. N. Armstrong and C. K. Smart, Quantitative stochastic homogenization of convex integral functionals , Ann. Sci. Éc. Norm. Supér. (4) 49 (2016), no. 2, 423–481.
7[7] M. Avellaneda and F.-H. Lin, Compactness methods in the theory of homogenization , Comm. Pure Appl. Math. 40 (1987), no. 6, 803–847.
8[8] L. A. Caffarelli, A localization property of viscosity solutions to the Monge-Ampère equation and their strict convexity , Ann. of Math. (2) 131 (1990), no. 1, 129–134.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A variational approach to regularity theory in optimal transportation

Abstract

1 Introduction

2 The optimal transport problem

Theorem 2.1**.**

Theorem 2.2** (Benamou-Brenier).**

Lemma 2.3**.**

3 The harmonic approximation theorem

Theorem 3.1**.**

Theorem 3.2**.**

Lemma 3.3**.**

Sketch of proof.

Lemma 3.4**.**

Sketch of proof.

Proof of Theorem 3.2.

4 Application to partial regularity

Theorem 4.1**.**

Theorem 4.2**.**

Theorem 4.3**.**

Proposition 4.4**.**

Sketch of proof.

5 Application to the optimal matching problem

Theorem 5.1**.**

Theorem 5.2**.**

Theorem 5.3**.**

Proposition 5.4**.**

Acknowledgements

Theorem 2.1.

Theorem 2.2 (Benamou-Brenier).

Lemma 2.3.

Theorem 3.1.

Theorem 3.2.

Lemma 3.3.

Lemma 3.4.

Theorem 4.1.

Theorem 4.2.

Theorem 4.3.

Proposition 4.4.

Theorem 5.1.

Theorem 5.2.

Theorem 5.3.

Proposition 5.4.