Entropic approximation of $\infty$-optimal transport problems

Guillaume Carlier; Camilla Brizzi; Luigi De Pascale

arXiv:2302.11896·math.AP·February 24, 2023

Entropic approximation of $\infty$-optimal transport problems

Guillaume Carlier, Camilla Brizzi, Luigi De Pascale

PDF

Open Access

TL;DR

This paper introduces an entropic approximation method for supremal cost optimal transport problems, proving convergence to $ty$-cyclically monotone plans and demonstrating numerical results with Sinkhorn's algorithm.

Contribution

It develops a novel entropic penalization approach for $ty$-optimal transport problems, establishing $ extGamma$-convergence and plan selection properties.

Findings

01

Proves $ extGamma$-convergence of the entropic approximation.

02

Shows the method selects $ty$-cyclically monotone plans.

03

Provides numerical illustrations using Sinkhorn's algorithm.

Abstract

We propose an entropic approximation approach for optimal transportation problems with a supremal cost. We establish $Γ$ -convergence for suitably chosen parameters for the entropic penalization and that this procedure selects $\infty$ -cyclically monotone plans at the limit. We also present some numerical illustrations performed with Sinkhorn's algorithm.

Equations187

γ (A \times R^{d}) = μ (A) \mbox an d γ (R^{d} \times A) = ν (A),

γ (A \times R^{d}) = μ (A) \mbox an d γ (R^{d} \times A) = ν (A),

γ \in Π (μ, ν) in f γ - missing ess s u p c = ∥ c ∥_{L^{\infty} (γ)} .

γ \in Π (μ, ν) in f γ - missing ess s u p c = ∥ c ∥_{L^{\infty} (γ)} .

γ \in Π (μ, ν) in f \int_{R^{d} \times R^{d}} c (x, y) d γ,

γ \in Π (μ, ν) in f \int_{R^{d} \times R^{d}} c (x, y) d γ,

γ \in Π (μ, ν) in f \int_{R^{d} \times R^{d}} c (x, y) d γ + ε H (γ ∣ μ \otimes ν),

γ \in Π (μ, ν) in f \int_{R^{d} \times R^{d}} c (x, y) d γ + ε H (γ ∣ μ \otimes ν),

J_{p, ε} (γ) := {(\int_{R^{d} \times R^{d}} c (x, y)^{p} d γ (x, y) + ε H (γ ∣ μ \otimes ν))^{\frac{1}{p}} + \infty if γ \in Π (μ, ν), otherwise,

J_{p, ε} (γ) := {(\int_{R^{d} \times R^{d}} c (x, y)^{p} d γ (x, y) + ε H (γ ∣ μ \otimes ν))^{\frac{1}{p}} + \infty if γ \in Π (μ, ν), otherwise,

H(\gamma|\mu\otimes\nu)=\begin{cases}\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\log\Big{(}\frac{\text{d}\gamma}{\text{d}\mu\otimes\nu}\Big{)}\mbox{d}\gamma&\mbox{ if $\gamma\ll\mu\otimes\nu$},\\ +\infty\quad&\mbox{ otherwise.}\end{cases}

H(\gamma|\mu\otimes\nu)=\begin{cases}\int_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\log\Big{(}\frac{\text{d}\gamma}{\text{d}\mu\otimes\nu}\Big{)}\mbox{d}\gamma&\mbox{ if $\gamma\ll\mu\otimes\nu$},\\ +\infty\quad&\mbox{ otherwise.}\end{cases}

J_{\infty} (γ) := {γ - missing ess s u p c + \infty if γ \in Π (μ, ν), otherwise .

J_{\infty} (γ) := {γ - missing ess s u p c + \infty if γ \in Π (μ, ν), otherwise .

J_{p} := J_{p, 1} .

J_{p} := J_{p, 1} .

H (γ_{p} ∣ μ \otimes ν) \leq p 2^{- p}

H (γ_{p} ∣ μ \otimes ν) \leq p 2^{- p}

\lim_{p\to\infty}\frac{1}{p}\log\Big{(}1+\varepsilon_{p}\frac{\log(p)}{(1+\lambda)^{p}}\Big{)}=0.

\lim_{p\to\infty}\frac{1}{p}\log\Big{(}1+\varepsilon_{p}\frac{\log(p)}{(1+\lambda)^{p}}\Big{)}=0.

p lim inf J_{p, ε_{p}} (γ_{p}) \geq p lim inf ∥ c ∥_{L^{p} (γ_{p})} .

p lim inf J_{p, ε_{p}} (γ_{p}) \geq p lim inf ∥ c ∥_{L^{p} (γ_{p})} .

p lim inf J_{p, ε_{p}} (γ_{p}) \geq p lim inf ∥ c ∥_{L^{q} (γ_{p})} = ∥ c ∥_{L^{q} (γ)}

p lim inf J_{p, ε_{p}} (γ_{p}) \geq p lim inf ∥ c ∥_{L^{q} (γ_{p})} = ∥ c ∥_{L^{q} (γ)}

p lim inf J_{p, ε_{p}} (γ_{p}) \geq ∥ c ∥_{L^{\infty} (γ)} = J_{\infty} (γ) .

p lim inf J_{p, ε_{p}} (γ_{p}) \geq ∥ c ∥_{L^{\infty} (γ)} = J_{\infty} (γ) .

J_{p, ε_{p}} (γ^{δ}) \leq ∥ c ∥_{L^{p} (γ^{δ})} + ε_{p}^{\frac{1}{p}} H (γ^{δ} ∣ μ \otimes ν)^{\frac{1}{p}} \leq ∥ c ∥_{L^{\infty} (γ^{δ})} + ε_{p}^{\frac{1}{p}} H (γ^{δ} ∣ μ \otimes ν)^{\frac{1}{p} .}

J_{p, ε_{p}} (γ^{δ}) \leq ∥ c ∥_{L^{p} (γ^{δ})} + ε_{p}^{\frac{1}{p}} H (γ^{δ} ∣ μ \otimes ν)^{\frac{1}{p}} \leq ∥ c ∥_{L^{\infty} (γ^{δ})} + ε_{p}^{\frac{1}{p}} H (γ^{δ} ∣ μ \otimes ν)^{\frac{1}{p} .}

∥ c ∥_{L^{\infty} (γ^{δ})} \leq ∥ c ∥_{L^{\infty} (γ)} + ω (2 d δ),

∥ c ∥_{L^{\infty} (γ^{δ})} \leq ∥ c ∥_{L^{\infty} (γ)} + ω (2 d δ),

H (γ^{δ} ∣ μ \otimes ν)^{\frac{1}{p}} \leq d^{\frac{1}{p}} lo g (L / δ)^{\frac{1}{p}}

H (γ^{δ} ∣ μ \otimes ν)^{\frac{1}{p}} \leq d^{\frac{1}{p}} lo g (L / δ)^{\frac{1}{p}}

\limsup_{p}J_{p,\varepsilon_{p}}(\gamma_{p})\leq J_{\infty}(\gamma)+\limsup_{p}\Big{(}\omega\Big{(}\frac{\sqrt{2d}}{p}\Big{)}+d^{\frac{1}{p}}\varepsilon_{p}^{\frac{1}{p}}\log(Lp)^{\frac{1}{p}}\Big{)}=J_{\infty}(\gamma),

\limsup_{p}J_{p,\varepsilon_{p}}(\gamma_{p})\leq J_{\infty}(\gamma)+\limsup_{p}\Big{(}\omega\Big{(}\frac{\sqrt{2d}}{p}\Big{)}+d^{\frac{1}{p}}\varepsilon_{p}^{\frac{1}{p}}\log(Lp)^{\frac{1}{p}}\Big{)}=J_{\infty}(\gamma),

J_{p, ε_{p}} (γ_{p})

J_{p, ε_{p}} (γ_{p})

\displaystyle\leq\Big{(}J_{\infty}(\gamma)+\omega\Big{(}\frac{\sqrt{2d}}{p}\Big{)}\Big{)}\Big{(}1+\frac{d\varepsilon_{p}\log(Lp)}{(1+\lambda)^{p}}\Big{)}^{\frac{1}{p}}

p lim sup J_{p, ε_{p}} (γ_{p}) \leq J_{\infty} (γ) .

p lim sup J_{p, ε_{p}} (γ_{p}) \leq J_{\infty} (γ) .

γ^{δ} := k, l \in Z^{d} : μ (Q_{k}^{δ}) > 0, ν (Q_{l}^{δ}) > 0 \sum γ (Q_{k}^{δ} \times Q_{l}^{δ}) μ_{k}^{δ} \otimes ν_{l}^{δ}

γ^{δ} := k, l \in Z^{d} : μ (Q_{k}^{δ}) > 0, ν (Q_{l}^{δ}) > 0 \sum γ (Q_{k}^{δ} \times Q_{l}^{δ}) μ_{k}^{δ} \otimes ν_{l}^{δ}

μ_{k}^{δ} (A) = \frac{μ ( Q _{k}^{δ} \cap A )}{μ ( Q _{k}^{δ} )}, ν_{l}^{δ} (A) = \frac{ν ( Q _{l}^{δ} \cap A )}{ν ( Q _{l}^{δ} )}

μ_{k}^{δ} (A) = \frac{μ ( Q _{k}^{δ} \cap A )}{μ ( Q _{k}^{δ} )}, ν_{l}^{δ} (A) = \frac{ν ( Q _{l}^{δ} \cap A )}{ν ( Q _{l}^{δ} )}

W_{\infty}(\gamma^{\delta},\gamma)\leq\sqrt{2d}\delta,\;H(\gamma^{\delta}|\mu\otimes\nu)\leq d\log\Big{(}\frac{L}{\delta}\Big{)},

W_{\infty}(\gamma^{\delta},\gamma)\leq\sqrt{2d}\delta,\;H(\gamma^{\delta}|\mu\otimes\nu)\leq d\log\Big{(}\frac{L}{\delta}\Big{)},

\frac{d γ ^{δ}}{d μ \otimes ν} (x, y) = {\frac{γ ( Q _{k}^{δ} \times Q _{l}^{δ} )}{μ ( Q _{k}^{δ} ) ν ( Q _{l}^{δ} )} 0 if (x, y) \in Q_{k}^{δ} \times Q_{l}^{δ}, and μ (Q_{k}^{δ}), ν (Q_{j}^{δ}) > 0, otherwise .

\frac{d γ ^{δ}}{d μ \otimes ν} (x, y) = {\frac{γ ( Q _{k}^{δ} \times Q _{l}^{δ} )}{μ ( Q _{k}^{δ} ) ν ( Q _{l}^{δ} )} 0 if (x, y) \in Q_{k}^{δ} \times Q_{l}^{δ}, and μ (Q_{k}^{δ}), ν (Q_{j}^{δ}) > 0, otherwise .

H (γ^{δ} ∣ μ \otimes ν) = k, l \in Z^{d} : μ (Q_{k}^{δ}) > 0, ν (Q_{l}^{δ}) > 0 \sum \int_{Q_{k}^{δ} \times Q_{l}^{δ}} lo g (\frac{γ ( Q _{k}^{δ} \times Q _{l}^{δ} )}{μ ( Q _{k}^{δ} ) ν ( Q _{l}^{δ} )}) d γ^{δ} \leq k, l \in Z^{d} : μ (Q_{k}^{δ}) > 0, ν (Q_{l}^{δ}) > 0 \sum \int_{Q_{k}^{δ} \times Q_{l}^{δ}} lo g (\frac{1}{μ ( Q _{k}^{δ} )}) d γ^{δ} = k \in Z^{d} : μ (Q_{k}^{δ}) > 0 \sum μ (Q_{k}^{δ}) lo g (\frac{1}{μ ( Q _{k}^{δ} )}),

H (γ^{δ} ∣ μ \otimes ν) = k, l \in Z^{d} : μ (Q_{k}^{δ}) > 0, ν (Q_{l}^{δ}) > 0 \sum \int_{Q_{k}^{δ} \times Q_{l}^{δ}} lo g (\frac{γ ( Q _{k}^{δ} \times Q _{l}^{δ} )}{μ ( Q _{k}^{δ} ) ν ( Q _{l}^{δ} )}) d γ^{δ} \leq k, l \in Z^{d} : μ (Q_{k}^{δ}) > 0, ν (Q_{l}^{δ}) > 0 \sum \int_{Q_{k}^{δ} \times Q_{l}^{δ}} lo g (\frac{1}{μ ( Q _{k}^{δ} )}) d γ^{δ} = k \in Z^{d} : μ (Q_{k}^{δ}) > 0 \sum μ (Q_{k}^{δ}) lo g (\frac{1}{μ ( Q _{k}^{δ} )}),

H (γ^{δ} ∣ μ \otimes ν) \leq k = 1 \sum N_{δ} μ (Q_{k}^{δ}) lo g (\frac{1}{μ ( Q _{k}^{δ} )}) \leq N_{δ} (\frac{1}{N _{δ}} k = 1 \sum N_{δ} μ (Q_{k}^{δ}) lo g (\frac{1}{\sum _{k = 1}^{N_{δ}} \frac{1}{N _{δ}} μ ( Q _{k}^{δ} )})) = lo g (N_{δ}) = d lo g (L) - d lo g (δ),

H (γ^{δ} ∣ μ \otimes ν) \leq k = 1 \sum N_{δ} μ (Q_{k}^{δ}) lo g (\frac{1}{μ ( Q _{k}^{δ} )}) \leq N_{δ} (\frac{1}{N _{δ}} k = 1 \sum N_{δ} μ (Q_{k}^{δ}) lo g (\frac{1}{\sum _{k = 1}^{N_{δ}} \frac{1}{N _{δ}} μ ( Q _{k}^{δ} )})) = lo g (N_{δ}) = d lo g (L) - d lo g (δ),

η^{δ} := j : γ (\overline{Q}_{j}) > 0 \sum γ (\overline{Q}_{j}) γ_{j} \otimes γ_{j}^{δ},

η^{δ} := j : γ (\overline{Q}_{j}) > 0 \sum γ (\overline{Q}_{j}) γ_{j} \otimes γ_{j}^{δ},

W_{\infty} (γ, γ^{δ}) \leq ∣∣ x - y ∣ ∣_{L^{\infty} (η^{δ})} \leq diam (\overline{Q}_{j}) = 2 d δ .

W_{\infty} (γ, γ^{δ}) \leq ∣∣ x - y ∣ ∣_{L^{\infty} (η^{δ})} \leq diam (\overline{Q}_{j}) = 2 d δ .

i = 1, \dots, k max {c (x_{i}, y_{i})} \leq i = 1, \dots, k max {c (x_{i}, y_{i + 1})},

i = 1, \dots, k max {c (x_{i}, y_{i})} \leq i = 1, \dots, k max {c (x_{i}, y_{i + 1})},

i = 1, \dots, k max {c (x_{i}, y_{i})} \leq i = 1, \dots, k max {c (x_{i}, y_{σ (i)})} .

i = 1, \dots, k max {c (x_{i}, y_{i})} \leq i = 1, \dots, k max {c (x_{i}, y_{σ (i)})} .

i = 1 \sum k c (x_{i}, y_{i}) \leq i = 1 \sum k c (x_{i}, y_{σ (i)}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Variational Analysis · Differential Equations and Boundary Problems · Nonlinear Differential Equations Analysis

Full text

Entropic approximation of $\infty$ -optimal transport problems

Camilla Brizzi Dipartimento di Matematica e Informatica, Università di Firenze, Viale Morgagni 67/a, 50134 Firenze, Italy, [email protected]

Guillaume Carlier CEREMADE, UMR CNRS 7534, Université Paris Dauphine, PSL, Pl. de Lattre de Tassigny, 75775 Paris Cedex 16, France and INRIA-Paris, MOKAPLAN, [email protected]

Luigi De Pascale Dipartimento di Matematica e Informatica, Università di Firenze, Viale Morgagni 67/a, 50134 Firenze, Italy, [email protected]

Abstract

We propose an entropic approximation approach for optimal transportation problems with a supremal cost. We establish $\Gamma$ -convergence for suitably chosen parameters for the entropic penalization and that this procedure selects $\infty$ -cyclically monotone plans at the limit. We also present some numerical illustrations performed with Sinkhorn’s algorithm.

Keywords: $\infty$ -optimal transport, $\infty$ -cyclical monotonicity, entropic approximation.

MS Classification: 49Q22, 65K10.

1 Introduction

The usual Monge-Kantorovich optimal transport problem consists, given a transportation cost and distribution of sources and targets, in finding a transport plan making the average transport cost minimal. It has attracted a considerable amount of attention in the last three decades, as can be seen from the textbooks of Villani [17], [18] and Santambrogio [15]. In optimal transport probems with a supremal cost (also called $L^{\infty}$ optimal transport), one rather looks for transport plans which minimize the essential supremum of the cost. Whereas the usual Monge-Kantorovich problem is linear programming, $L^{\infty}$ optimal transport leads to non-convex optimization (eventhough the supremal cost has convex sublevel sets), which to a large extent, explains why there are less theoretical results and numerical methods (with the notable exception of the recent combinatorial approach of Bansil and Kitagawa [1]) to address them. As in the Calculus of Variations with a supremal functional, $L^{\infty}$ optimal transport may admit many minimizers and selecting special ones which satisfy tractable optimality conditions is an important issue, which was studied first by Champion, De Pascale and Juutinen in [6]. In contrast with the classical Monge-Kantorovich problem, where restrictions of optimal plans remain optimal between their marginals, this might be false for $L^{\infty}$ optimal transport. This is why the authors of [6] have introduced the notion of restrictable optimal and shown that such restrictable solutions are characterized by a remarkable property of $\infty$ -cyclical monotonicity of their support. This was the starting point for the existence of optimal maps for $L^{\infty}$ optimal transport under various conditions on the cost and the marginals, see [6], [10], [3].

Among numerical methods for optimal transport (see Cuturi and Peyré [14], Benamou [2], Mérigot and Thibert [12]), the entropic penalization approach and the Sinkhorn algorithm have gained a lot of popularity since Cuturi’s paper [7]. Entropic optimal transport, which has connections with large deviations and the so-called Schrödinger bridge problem, see Léonard [11] has also stimulated an intensive stream of recent theoretical research, see the lecture notes of Nutz [13] and the references therein. A recent breakthrough in the field is the work of Bernton, Ghosal and Nutz [8] where a large deviations principle, related to cyclical monotonicity is established for entropic optimal plans.

The goal of the present paper is to investigate, theoretically and numerically, whether the entropic approximation strategy can be used for $L^{\infty}$ optimal transport as well. We will in particular see how the results of [8] can be used to show that this approximation selects at the limit the distinguished restrictable $\infty$ -cyclically monotone minimizers introduced in [6].

The article is organized as follows. The setting is introduced in Section 2. Section 3 is devoted to $\Gamma$ -convergence towards the supremal cost functional. In Section 4, we study how the entropic penalization selects $\infty$ -cyclically monotone plans in the limit. In Section 5, we give some quantitative convergence estimates and a large deviations upper bound in the spirit of [8]. Finally, we present some numerical illustrations in Section 6.

2 Assumptions and notations

In the sequel, we will always assume that the transportation cost is continuous and nonnegative, $c\in C(\mathbb{R}^{d}\times\mathbb{R}^{d},\mathbb{R}_{+})$ , and that the fixed marginals of the problem, $\mu,\nu$ are two Borel probability measures on $\mathbb{R}^{d}$ , $\mu,\nu\in\mathcal{P}(\mathbb{R}^{d})$ , with compact support. Let $\Pi(\mu,\nu)$ be the set of transport plans between $\mu$ and $\nu$ i.e. the set of Borel probability measures on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ having $\mu$ and $\nu$ as marginals. More precisely, a Borel probability measure $\gamma$ on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ belongs to $\Pi(\mu,\nu)$ when

[TABLE]

for every Borel subset $A$ of $\mathbb{R}^{d}$ . Note that every $\gamma$ in $\Pi(\mu,\nu)$ has its support in $\operatorname{spt}(\mu)\times\operatorname{spt}(\nu)$ and that $c$ is uniformly continuous on $\operatorname{spt}(\mu)\times\operatorname{spt}(\nu)$ . We are interested in the following $\infty$ -optimal transport problem (see [6]):

[TABLE]

In contrast with classical optimal transport where one minimizes an integral cost,

[TABLE]

( $\infty$ -OT) is a non-convex and presumably harder problem.

Due to the success of entropic approximation of optimal transport with regularization parameter $\varepsilon>0$

[TABLE]

recalled in the introduction, it seems natural to introduce, for $\varepsilon>0$ and exponent $p\geq 1$ the following functional $J_{p,\varepsilon}:\mathcal{P}(\mathbb{R}^{d}\times\mathbb{R}^{d})\to\mathbb{R}\cup\{+\infty\}$

[TABLE]

where $H$ stands for relative entropy:

[TABLE]

Note that due to strict convexity of the entropy, for every $\varepsilon>0$ and $p\geq 1$ , $J_{p,\varepsilon}$ admits a unique minimizer. We now denote by $J_{\infty}:\mathcal{P}(\mathbb{R}^{d}\times\mathbb{R}^{d})\to\mathbb{R}\cup\{+\infty\}$ , the supremal functional

[TABLE]

Finally, let us set

[TABLE]

Since $H(\gamma|\mu\otimes\nu)\geq 0$ with an equality exactly when $\gamma=\mu\otimes\nu$ , $J_{p,\varepsilon}(\gamma)\geq\|c\|_{L^{p}(\gamma)}$ but also $\|c\|_{L^{p}(\gamma)}\leq J_{\infty}(\gamma)$ . So, roughly speaking both approximations play in opposite directions: adding the entropic term is an approximation from above but approximating $\|c\|_{L^{\infty}(\gamma)}$ by $\|c\|_{L^{p}(\gamma)}$ is an approximation from below.

We also observe that letting $p\to\infty$ and $\varepsilon\to 0$ is not enough to ensure that minimizers of $J_{p,\varepsilon}$ converge to a minimizer of $J_{\infty}$ (i.e. a solution of $\infty$ -OT). Indeed, if $\|c\|_{\infty}\leq\frac{1}{2}$ and $\varepsilon=\frac{1}{p}$ the minimizer $\gamma_{p}$ of $J_{p,\frac{1}{p}}$ satisfies

[TABLE]

hence $\gamma_{p}$ converges (actually strongly by Pinsker’s inequality, see e.g. Lemma 2.5 in [16]) to $\mu\otimes\nu$ which in general is not a minimizer of $J_{\infty}$ . On the one hand, this suggests that $\Gamma$ -convergence of the regularizations above to $J_{\infty}$ require conditions relating $\varepsilon$ to $p$ . On the other hand, in the previous example, we see that the range of $c^{p}$ compared to the size of the entropic penalization $\varepsilon$ is crucial. But the solutions of the $\infty$ -optimal transport problem are invariant when one replaces $c$ by an increasing function of $c$ , in particular one can replace $c$ by $c+2$ (say) so that $c^{p}$ will typically dominate the entropic term and one can expect $\Gamma$ -convergence as $p\to\infty$ for a fixed (or even large) value of $\varepsilon$ (see the next section for more details).

3 $\Gamma$ -convergence

Our first result concerns the $\Gamma$ -convergence of $J_{p,\varepsilon}$ to $J_{\infty}$ :

Theorem 3.1.

Under the general assumptions of Section 2 we have:

$J_{p,\varepsilon_{p}}$ * $\Gamma$ -converges (for the weak star topology of $\mathcal{P}(\operatorname{spt}(\mu)\times\operatorname{spt}(\nu)$ ) to $J_{\infty}$ as $p\to\infty$ provided $\varepsilon_{p}^{\frac{1}{p}}\to 0$ as $p\to\infty$ ,* 2. 2.

if, in addition, $c\geq 1+\lambda$ with $\lambda\geq 0$ , then $J_{p,\varepsilon_{p}}$ $\Gamma$ -converges to $J_{\infty}$ as $p\to\infty$ provided

[TABLE]

In particular, in this case, $J_{p,1}$ and $J_{p,p}$ $\Gamma$ -converge to $J_{\infty}$ as $p\to\infty$ .

Proof.

Let $\gamma_{p}\in\Pi(\mu,\nu)$ converge weakly star to $\gamma$ . By nonnegativity of $H(\gamma_{p}|\mu\otimes\nu)$ , we have

[TABLE]

Hence, for fixed $q$ , since $\|c\|_{L^{p}(\gamma_{p})}\geq\|c\|_{L^{q}(\gamma_{p})}$ for $p\geq q$ , we have

[TABLE]

taking the supremum with respect to $q$ thus yields the desired $\Gamma$ -liminf inequality

[TABLE]

Let us now prove the $\Gamma$ -limsup inequality. For any $\gamma\in\Pi(\mu,\nu)$ we consider $\gamma^{\delta}$ , the block approximation of $\gamma$ at scale $\delta\in(0,1)$ defined by (3.3) below, whose convergence to $\gamma$ is guaranteed by the first inequality in (3.4). By concavity, we first have for $p\geq 1$ ,

[TABLE]

Denoting by $\omega$ a modulus of continuity of $c$ on $\operatorname{spt}(\mu)\times\operatorname{spt}(\nu)$ , thanks to the first inequality in (3.4), we have

[TABLE]

being $\sqrt{2d}\delta$ the diameter of the cubes of the approximation. Moreover, by the second inequality in (3.4), we have

[TABLE]

so if we define $\gamma_{p}$ as the block approximation of $\gamma$ at scale $\delta=\frac{1}{p}$ (say), we obtain

[TABLE]

since we have assumed that $\varepsilon_{p}^{\frac{1}{p}}\to 0$ as $p\to+\infty$ .

Let us now assume that $c\geq 1+\lambda$ , the proof of the $\Gamma$ -liminf inequality for $J_{p,\varepsilon_{p}}$ is exactly as above. For $\gamma\in\Pi(\mu,\nu)$ and $\gamma_{p}$ the block approximation of $\gamma$ at scale $\frac{1}{p}$ , we have

[TABLE]

so that, as soon as (3.1) holds, one has

[TABLE]

∎

*Remark 3.2**.*

Notice that in case $c\geq 1+\lambda$ for some $\lambda>0$ , $\Gamma$ -convergence of $J_{p,\varepsilon_{p}}$ to $J_{\infty}$ is guaranteed even for fastly increasing $\varepsilon_{p}$ like $\varepsilon_{p}=p^{m}(1+\lambda)^{p}$ with $m\geq 0$ . On the contrary, in the general case, the condition $\varepsilon_{p}^{\frac{1}{p}}\to 0$ requires to choose values of $\varepsilon$ way too small to be used in practice for numerical computations. This suggests in practice to rescale the cost so that it is bounded from below by $1$ .

*Remark 3.3**.*

We observe that in (3.2) it is sufficient that $||c||_{L^{\infty}(\gamma_{p})}\geq 1+\lambda$ , therefore the conclusion of case 2. in Theorem 3.1 remains valid under the weaker assumption that $v_{\infty}=\min_{\Pi(\mu,\nu)}J_{\infty}\geq 1+\lambda$ .

For the $\Gamma$ -limsup inequality, we have used the block approximation introduced in [4], which is defined as follows:

Definition 3.4.

Let $\gamma\in\Pi(\mu,\nu)$ . For $\delta>0$ and $k\in{\mathbb{Z}}^{d}$ , we denote by $Q_{k}^{\delta}$ the cube $\delta(k+[0,1)^{d})$ . The block approximation of $\gamma$ at scale $\delta\in(0,1)$ is then defined by

[TABLE]

where $\mu_{k}^{\delta}$ and $\nu_{l}^{\delta}$ are defined by

[TABLE]

for every Borel subset $A$ of $\mathbb{R}^{d}$ .

For the sake of completeness, we give a short proof of the properties of the block approximation that we have used in the proof of Theorem 3.1 (see [4] and [5] for related results):

Lemma 3.5.

Let $\gamma\in\Pi(\mu,\nu)$ and $\gamma^{\delta}$ be the block approximation of $\gamma$ at scale $\delta\in(0,1)$ , then $\gamma^{\delta}\in\Pi(\mu,\nu)$ and

[TABLE]

where $L$ is a constant depending only on $\operatorname{spt}(\mu)$ (actually on its diameter).

Proof.

The fact that $\gamma^{\delta}\in\Pi(\mu,\nu)$ is easy to check by construction (see [4]). Now observe that by (3.3) the density of $\gamma^{\delta}$ with respect to $\mu\otimes\nu$ is

[TABLE]

Therefore

[TABLE]

where the inequality is due to the fact that $\frac{\gamma(Q_{k}^{\delta}\times Q_{l}^{\delta})}{\nu(Q_{l}^{\delta})}\leq 1$ , while the last equality is obtained summing over $\l$ . If $L\geq 1$ is such that $\operatorname{spt}\mu$ is contained in a cube of side $L-1$ , the number of cubes $Q_{k}^{\delta}$ with positive $\mu$ -measure is not greater than $N_{\delta}:=\left(\frac{L}{\delta}\right)^{d}$ . Therefore, applying Jensen’s inequality to the concave function $f(z)=z\log(\frac{1}{z})$ , we have

[TABLE]

which proves the second inequality in (3.4).

By construction $\gamma(Q_{k}^{\delta}\times Q_{l}^{\delta})=\gamma^{\delta}(Q_{k}^{\delta}\times Q_{l}^{\delta})$ , for any $k,l$ . Let $J$ be the set of pairs of indices $(k,l)$ such that $\gamma^{\delta}(Q_{k}^{\delta}\times Q_{l}^{\delta})>0$ and set $\overline{Q}_{j}=Q_{k}^{\delta}\times Q_{l}^{\delta}$ , for any $j=(k,l)\in J$ . We define

[TABLE]

where $\gamma_{j}(A):=\frac{\gamma(A\cap\overline{Q}_{j})}{\gamma(\overline{Q}_{j})}$ and $\gamma^{\delta}_{j}(A):=\frac{\gamma^{\delta}(A\cap\overline{Q}_{j})}{\gamma^{\delta}(\overline{Q}_{j})}$ . By construction $\eta^{\delta}\in\Pi(\gamma,\gamma^{\delta})$ , thus

[TABLE]

∎

4 Selection of plans with $\infty$ -cyclically monotone support

As shown in [6] and [10], restrictable minimizers of $J_{\infty}$ are supported on $\infty$ -cyclically monotone sets, such sets are defined as follows:

Definition 4.1.

A set $\Gamma\subset\mathbb{R}^{d}\times\mathbb{R}^{d}$ is said to be $\infty$ -cyclically monotone if we have that

[TABLE]

for all $k\in{\mathbb{N}}^{*}$ and $\left\{(x_{i},y_{i})\right\}_{i=1}^{k}\subset\Gamma$ , where $y_{k+1}=y_{1}$ . A transport plan $\gamma$ is said to be $\infty$ -cyclically monotone if $\operatorname{spt}\gamma$ is an $\infty$ -cyclically monotone set.

Since every permutation can be obtained as composition of cycles on disjoint sets and trivial cycles on fixed points, one can see that $\infty$ -cyclical monotonicity of a set $\Gamma\subset\mathbb{R}^{d}\times\mathbb{R}^{d}$ is equivalent to the fact that for every $k\in{\mathbb{N}}^{*}$ , every $\left\{(x_{i},y_{i})\right\}_{i=1}^{k}\subset\Gamma$ and every $\sigma\in\Sigma(k)$ (where $\Sigma(k)$ is the permutation group of $\{1,\ldots,k\}$ ), one has

[TABLE]

Usually, in the literature, the previous definition is called $\infty$ - $c$ -cylical monotonicity, to keep notations simple, we have omitted the dependence on the cost $c$ ; let us remark that $\infty$ -cyclical monotonicity is invariant by replacing $c$ by a strictly increasing transformations of $c$ (like $c^{p}$ with $p>0$ ), contrarily to the usual notion of $c$ -cyclical monotonicity. We recall that a nonempty subset $\Gamma$ of $\mathbb{R}^{d}\times\mathbb{R}^{d}$ is called $c$ -cyclically monotone when for every $k\in{\mathbb{N}}^{*}$ , every $(x_{i},y_{i})_{i=1}^{k}\subset\Gamma$ and every permutation $\sigma\in\Sigma(k)$ , one has

[TABLE]

Our goal in this section is to investigate the convergence of the entropic approximation to $\infty$ -cyclically monotone plans. We shall make use of the analysis of the landmark recent article [8]. Let us first recall the notion of $(c,\varepsilon)$ -cyclically invariance introduced in [8]:

Definition 4.2.

Let $c:\mathbb{R}^{d}\times\mathbb{R}^{d}\to(0,\infty)$ be a measurable function. A coupling $\gamma\in\Pi(\mu,\nu)$ is called $(c,\varepsilon)$ -cyclically invariant if $\gamma\ll\mu\otimes\nu$ and its density admits a representative $\frac{\text{d}\gamma}{\text{d}\mu\otimes\nu}:\mathbb{R}^{d}\times\mathbb{R}^{d}\to(0,\infty)$ such that

[TABLE]

for all $k\in{\mathbb{N}}^{*}$ and $\left\{(x_{i},y_{i})\right\}_{i=1}^{k}\subset\mathbb{R}^{d}\times\mathbb{R}^{d}$ , where $y_{k+1}=y_{1}$ .

In [8] (Proposition 2.2), it is shown that whenever ( $\varepsilon$ -EOT) is finite, the (unique) solution $\gamma_{\varepsilon}$ of ( $\varepsilon$ -EOT) is characterized by being $(c,\varepsilon)$ -cyclically invariant. The next lemma, which is a part of Lemma 3.1 in [8], provides an estimate for $(c,\varepsilon)$ -cyclically invariant couplings, which will be useful for our purpose. For the reader’s convenience we provide also here the proof.

Lemma 4.3.

Let $\varepsilon>0$ and $\gamma_{\varepsilon}\in\Pi(\mu,\nu)$ be $(c,\varepsilon)$ -cyclical invariant. For every fixed $k\geq 2$ , $k\in{\mathbb{N}}$ , and $\delta\geq 0$ , let $A_{k,c}(\delta)$ be the set defined by

[TABLE]

where $y_{k+1}=y_{1}$ . Let $A\subset A_{k,c}(\delta)$ be Borel. Then $\gamma_{\varepsilon}^{k}:=\prod_{i=1}^{k}(\gamma_{\varepsilon})(\text{d}x_{i},\text{d}y_{i})$ satisfies

[TABLE]

Proof.

By Definition 4.2 of $(c,\varepsilon)$ -cyclical invariance, for $\gamma_{\varepsilon}^{k}$ a.e. $(x_{i},y_{i})_{i=1}^{k}\in A$ we have that

[TABLE]

In one defines the set $\overline{A}:=\{(x_{i},y_{i+1})\,:\,(x_{i},y_{i})\in A\}$ , by integrating over $A$ with respect to $\gamma_{\varepsilon}^{k}=\prod\gamma_{\varepsilon}(x_{i},y_{i})=\prod\gamma_{\varepsilon}(x_{i},y_{i+1})$ we obtain

[TABLE]

∎

The fact that the entropic approximation procedure selects $\infty$ -cyclically monotone plans is then ensured by the following:

Theorem 4.4.

Under the general assumptions of Section 2, further assume that $c>0$ everywhere, and let $\gamma_{p,\varepsilon_{p}}$ be the minimizer of $J_{p,\varepsilon_{p}}$ . Then, any weak star cluster point $\gamma_{\infty}$ as $p\to\infty$ of the family $\{\gamma_{p,\varepsilon_{p}}\}_{p\geq 1}$ is $\infty$ -cyclically monotone, provided

$\varepsilon_{p}^{\frac{1}{p}}\to 0$ * as $p\to\infty$ ,* 2. 2.

$\varepsilon_{p}=o(p(1+\lambda)^{p})$ * if, in addition, $c\geq 1+\lambda$ with $\lambda\geq 0$ .*

Proof.

Up to extracting a subsequence, let us assume that $\gamma_{p,\varepsilon_{p}}$ weakly star converges to $\gamma_{\infty}$ . We proceed by contradiction assuming that there exist $\delta>0$ and a finite sequence of points $\left(x_{i},y_{i}\right)_{i=1}^{k}$ contained in $\operatorname{spt}\gamma_{\infty}$ , such that

[TABLE]

By the continuity of the cost function $c$ and by the uniform convergence of $\left(\sum_{i=1}^{k}c(x^{\prime}_{i},y^{\prime}_{i})^{p}\right)^{\frac{1}{p}}$ to $\max_{i=1,\dots,k}\{c(x^{\prime}_{i},y^{\prime}_{i})\}$ , as $p\to+\infty$ , we deduce that for every $i=1,\dots,k$ there exists an open neighborhood $U_{i}$ of $(x_{i},y_{i})$ and $p(\delta)>0$ , such that

[TABLE]

for every $(x^{\prime}_{i},y^{\prime}_{i})\in U_{i}$ (again with the convention that $y^{\prime}_{k+1}=y^{\prime}_{1}$ ) and $p\geq p(\delta)$ . We now observe that

[TABLE]

where the last inequality follows from the convexity of $t\mapsto t^{p}$ , with $p>1$ . Since $c>0$ there exists some $b>0$ such that $c\geq b$ on each $U_{i}$ , $i=1,\dots,k$ , hence, for every $(x^{\prime}_{i},y^{\prime}_{i})\in U_{i}$ and $p\geq p(\delta)$

[TABLE]

We thus have $U_{1}\times\dots\times U_{k}\subset A_{k,c^{p}}(p\delta b^{p-1})$ , where $A_{k,c^{p}}(p\delta b^{p-1})$ is defined as in (4.2) with $c$ replaced by $c^{p}$ . Applying Lemma 4.3, we thus get:

[TABLE]

so that if $\varepsilon_{p}^{\frac{1}{p}}\to 0$ as $p\to\infty$ , for large enough $p$ one has $\varepsilon_{p}\leq b^{p}$ , which yields

[TABLE]

On the other hand, since the points $(x_{i},y_{i})$ belong to $\operatorname{spt}\gamma_{\infty}$ , we have that $\gamma_{\infty}^{k}(U_{1}\times\dots\times U_{k})>0$ , which yields the desired contradiction. This shows the first assertion. Now, if $c\geq(1+\lambda)$ with $\lambda\geq 0$ , we can replace $b$ by $(1+\lambda)$ in (4.5) and the same conclusion will be reached as soon as $\varepsilon_{p}=o(p(1+\lambda)^{p})$ , proving the second assertion.

∎

*Remark 4.5**.*

Despite what we observed in Remark 3.3 regarding Theorem 3.1, in the proof of the second assertion of Theorem 4.4, it does not seem that the condition $c(x,y)\geq 1$ for every $(x,y)$ can be weakened to $J_{\infty}\geq 1$ . Note also that the condition $\varepsilon_{p}=o(p(1+\lambda)^{p})$ is stronger than condition (3.1) that guarantees $\Gamma$ -convergence when $c\geq 1+\lambda$ .

5 Some estimates on the speed of convergence

Our aim in this Section is to give some error estimates for $v_{p}-v_{\infty}$ where

[TABLE]

where $J_{p}:=J_{p,1}$ (i.e. for the sake of simplicity we take $\varepsilon_{p}=1$ as entropic penalization parameter).

5.1 Upper bounds

Proposition 5.1 (Upper bounds on the speed of convergence).

Let $c\in C^{0,\alpha}(\mathbb{R}^{d}\times\mathbb{R}^{d})$ , with $\alpha\in(0,1]$ and let us assume that $v_{\infty}\geq 1+\lambda$ for some $\lambda\geq 0$ . Then we have

[TABLE]

Proof.

Let $\gamma_{\infty}$ be a minimizer of $J_{\infty}$ and $\gamma^{\delta}$ be the block approximation of $\gamma_{\infty}$ at scale $\delta\in(0,1)$ , as defined in (3.3). We observe that, by construction and by the Hölder condition on $c$ , denoting by $A$ the $C^{0,\alpha}$ semi-norm of $c$ , we first have

[TABLE]

Then

[TABLE]

where the last inequality follows from Lemma 3.5. For $\lambda>0$ , choosing $\delta:=e^{-p}$ , (5.2) becomes (setting $C=d\log(L)$ )

[TABLE]

then, we observe that for large $p$ , one has

[TABLE]

Therefore, for $p$ large enough,

[TABLE]

for some $B>0$ and $\beta=\min\{\alpha,\log(1+\lambda)\}$ .

Now if $\lambda=0$ , we choose $\delta=p^{-1/\alpha}$ in (5.2) which gives

[TABLE]

which ends the proof. ∎

5.2 Upper and lower bounds in the discrete case

Let us now consider the discrete case where there exist $x_{1},\dots,x_{N}$ and $y_{1},\dots,y_{M}$ points in $\mathbb{R}^{d}$ such that

[TABLE]

with (strictly, without loss of generality) positive weights $\mu_{i}$ and $\nu_{j}$ summing to $1$ . To shorten notations let us set $c_{ij}=c(x_{i},y_{j})\geq 0$ . In this setting, transport plans $\gamma$ will simply be denoted as $N\times M$ matrices with entries $\gamma^{ij}$ . We also recall that in the discrete setting $\Pi(\mu,\nu)$ is a convex polytope and the constraint $\gamma\in\Pi(\mu,\nu)$ is equivalent to

[TABLE]

In the discrete setting transport plans have a finite entropy with respect to $\mu\otimes\nu$ , with the (crude) bound

[TABLE]

for every $\gamma\in\Pi(\mu,\nu)$ . So if $v_{\infty}\geq 1+\lambda$ with $\lambda\geq 0$ , taking $\gamma_{\infty}$ a minimizer of $J_{\infty}$ , we obtain

[TABLE]

which gives (in a straightforward way, i.e. without using block approximation) an exponentially decaying upper bound for $v_{p}-v_{\infty}$ for $\lambda>0$ and an algebraic upper bound $v_{p}-v_{\infty}\leq O(1/p)$ if $\lambda=0$ . The fact that $v_{\infty}\geq 1$ therefore ensures that $p(v_{p}-v_{\infty})$ is bounded from above. It turns out, that in the discrete setting, this condition also guarantees that we also have an algebraically decaying lower bound for the error. To see this, we first need the following:

Lemma 5.2.

Let $\mu$ and $\nu$ be discrete measures i.e. of the form (5.3) and define

[TABLE]

and for every $\gamma\in F_{\infty}$ ,

[TABLE]

then there is some $\theta>0$ such that $m(\gamma)\geq\theta$ , for every $\gamma\in F_{\infty}$ .

Proof.

Since $v_{\infty}$ is the minimum of $J_{\infty}$ over $\Pi(\mu,\nu)$ , one can write $F_{\infty}$ as the set of transport plans for which

[TABLE]

or equivalently

[TABLE]

In other words, $F_{\infty}$ is the facet of $\Pi(\mu,\nu)$ where the linear form $l$ (which is nonnegative on $\Pi(\mu,\nu)$ ) achieves its minimum and it is therefore a convex polytope, whose extreme points belong to the (finite) set of extreme points of $\Pi(\mu,\nu)$ . Let us then denote by $\{\gamma_{a},\,a\in A\}$ with $A$ a finite index set the set of extreme points of $F_{\infty}$ . Thanks to Minkowski’s theorem, we can write any $\gamma\in F_{\infty}$ as

[TABLE]

for some weights $\alpha_{a}\geq 0$ summing to $1$ . In particular we may pick $a_{0}\in A$ with $\alpha_{a_{0}}\geq\frac{1}{|A|}$ (with $|A|$ denoting the cardinality of $A$ ). Then we have

[TABLE]

where the strict positivity of $\theta$ then follows from the fact that $A$ is finite and $m(\gamma_{a})>0$ for every $a\in A$ . ∎

We are now ready to prove the announced lower bound.

Proposition 5.3 (Lower bound on the speed of convergence, discrete case).

Assume that $\mu$ and $\nu$ are discrete measures i.e. of the form (5.3) and that $v_{\infty}\geq 1$ , then $p(v_{p}-v_{\infty})$ is bounded from below. Hence

[TABLE]

Proof.

Let us argue by contradiction and assume that $p(v_{p}-v_{\infty})$ is unbounded from below, then there is a sequence $p_{n}\to\infty$ as $n\to\infty$ such that

[TABLE]

Letting $\gamma_{n}$ be the minimizer of $J_{p_{n}}$ , passing to a subsequence if necessary, we may assume that $\gamma_{n}$ converges to some $\gamma_{\infty}$ which belongs to $F_{\infty}$ (as defined in Lemma 5.2) since $v_{\infty}\geq 1$ . In particular, there exists $i_{0},j_{0}$ such that

[TABLE]

where $\theta$ is the lower bound from Lemma 5.2. Since $\gamma_{n}^{i_{0}j_{0}}$ converges to $\gamma_{\infty}^{i_{0}j_{0}}$ we have, for large enough $n$ , $\gamma_{n}^{i_{0}j_{0}}\geq\frac{\theta}{2}$ , hence, using the fact that $c_{i_{0}j_{0}}=v_{\infty}$ and again the nonnegativity of the entropy

[TABLE]

which is the desired contradiction to (5.4).

∎

5.3 A large deviations upper bound

In this (somehow independent) paragraph, our goal is to discuss a (partial) extension of the large deviations results of [8] to the $L^{\infty}$ -optimal transport framework. Considering the Monge-Kantorovich problem (OT) it is well-known (see [9], [15]) that the optimality for (OT) of a plan $\gamma\in\Pi(\mu,\nu)$ is characterized by a property of $c$ -cyclical monotonicity of its support $\Gamma:=\operatorname{spt}(\gamma)$ , where $c$ -cyclical monotonicity is defined by (4.1). To analyze fine convergence properties of the entropic approximation of (OT), defined by ( $\varepsilon$ -EOT), assuming convergence (taking a subsequence if necessary) as $\varepsilon\to 0^{+}$ , of the minimizer $\gamma_{\varepsilon}$ of ( $\varepsilon$ -EOT) to some $\gamma$ and denoting by $\Gamma$ the $c$ -cyclically monotone set $\operatorname{spt}(\gamma)$ , the authors of [8] introduced

[TABLE]

with $(x_{1},y_{1})=(x,y)$ . They proved that $I$ is a good rate function for the family of optimal entropic plans, $\{\gamma_{\varepsilon}\}_{\varepsilon>0}$ in the sense that it obeys, under very general conditions, the large deviations principle

[TABLE]

for every compact $C$ and every open $U$ included in $\operatorname{spt}(\mu)\times\operatorname{spt}(\nu)$ . Denoting by $\gamma_{p,\varepsilon}$ the minimizer of $J_{p,\varepsilon}$ , the results of [8] (using $c^{p}$ instead of $c$ ) of course apply to the convergence of $\gamma_{p,\varepsilon}$ as $\varepsilon\to 0^{+}$ for a fixed exponent $p$ . For $L^{\infty}$ optimal transport, it makes more sense to rather consider the situation where $\varepsilon>0$ is fixed and $p$ tends to $\infty$ . More precisely, we know from Theorem 4.4, that if $c\geq 1$ , $\varepsilon>0$ is fixed, the family $\{\gamma_{p,\varepsilon}\}_{p\geq 1}$ weakly star converges (again possibly after an extraction) to some $\gamma_{\infty}$ as $p\to\infty$ , $\Gamma_{\infty}:=\operatorname{spt}(\gamma_{\infty})$ is $\infty$ -cyclically monotone. In addition to the general assumptions of Section 2, we shall further assume throughout this paragraph that

•

$c\geq 1$ ,

•

$\varepsilon>0$ being fixed, the sequence of minimizers $\{\gamma_{p,\varepsilon}\}_{p\geq 1}$ weakly star converges as $p\to\infty$ to some $\gamma_{\infty}$ , with ( $\infty$ -cyclically monotone) support $\Gamma_{\infty}$ .

Let us define for every $(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}$

[TABLE]

where $(x_{1},y_{1})=(x,y)$ . Also define

[TABLE]

where $(x_{1},y_{1})=(x,y)$ and $y_{k+1}=y_{1}$ . In our supremal optimal transport setting, we cannot really expect that $I_{\infty}$ is a good rate function for $\{\gamma_{p,\varepsilon}\}_{p\geq 1}$ ; indeed, $\operatorname{argmin}_{\Pi(\mu,\nu)}J_{\infty}$ is unchanged when replacing $c$ with a strictly increasing function of $c$ , while the same does not hold for the function $I_{\infty}$ . However it can be interesting to have a better understanding of the function $I_{\infty}$ , which still provides an upper bound for the family $\{\gamma_{p,\varepsilon}\}$ (see Proposition 5.6).

Lemma 5.4.

Let $I_{\infty}$ and $\widetilde{I}_{\infty}$ be defined as above, then

•

$I_{\infty}$ * and $\widetilde{I}_{\infty}$ are related by $I_{\infty}=\max(0,\widetilde{I}_{\infty})$ ,*

•

$I_{\infty}$ * and $\widetilde{I}_{\infty}$ are lower semicontinuous, $I_{\infty}\geq 0$ , $I_{\infty}=0$ on $\Gamma_{\infty}$ ,*

•

$I_{\infty}$ * and $\widetilde{I}_{\infty}$ coincide on $(\operatorname{spt}(\mu)\times\mathbb{R}^{d})\cup(\mathbb{R}^{d}\times\operatorname{spt}(\nu))$ .*

Proof.

The fact that $I_{\infty}\geq\max(0,\widetilde{I}_{\infty})$ is obvious as well as the fact that $\widetilde{I}_{\infty}=0$ on $\Gamma_{\infty}$ .

We now prove the converse inequality. Fix now $(x,y)=(x_{1},y_{1})\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ , $k\geq 2$ , $(x_{2},y_{2}),\ldots(x_{k},y_{k})$ in $\Gamma_{\infty}$ and $\sigma\in\Sigma(k)$ . We can then partition $\{1,\ldots,k\}$ into $I_{0}$ the (possibly empty) set of fixed-points of $\sigma$ and disjoint (empty if $\sigma$ is the identity) orbits $I_{1},\ldots,I_{l}$ on each of which $\sigma$ is a cycle, this means that for $j=1,\ldots,l$ , we may denote $(x_{i},y_{i})_{i\in I_{j}}$ as $(\widetilde{x}^{j}_{r},\widetilde{y}^{j}_{r})_{r=1,\ldots,|I_{j}|}$ and $(x_{i},y_{\sigma(i)})_{i\in I_{j}}$ as $(\widetilde{x}^{j}_{r},\widetilde{y}^{j}_{r+1})_{r=1,\ldots,|I_{j}|}$ with the convention $\widetilde{y}^{j}_{|I_{j}|+1}=\widetilde{y}^{j}_{1}$ . We now observe that

[TABLE]

where the max with respect to $j$ is taken on indices for which $I_{j}$ is nonempty. To shorten notations, for such a $j$ let us set

[TABLE]

Of course if $I_{0}$ is nonempty, $\beta_{0}=0$ , now if $j\geq 1$ and $I_{j}$ is nonempty

[TABLE]

So, if $(\widetilde{x}_{1}^{j},\widetilde{y}_{1}^{j})=(x_{1},y_{1})$ , $\beta_{j}\leq\widetilde{I}_{\infty}(x,y)$ and if $(\widetilde{x}_{1}^{j},\widetilde{y}_{1}^{j})\neq(x_{1},y_{1})$ , then $(\widetilde{x}_{1}^{j},\widetilde{y}_{1}^{j})\in\Gamma_{\infty}$ , hence $\widetilde{I}_{\infty}(\widetilde{x}_{1}^{j},\widetilde{y}_{1}^{j})=0$ by the definition of $\widetilde{I}_{\infty}$ and the fact that $\Gamma_{\infty}$ is $\infty$ -cyclically monotone. In other words, we can bound from above each $\beta_{j}$ by $\max(0,\widetilde{I}_{\infty}(x,y))$ . Taking suprema with respect to $k$ , $(x_{2},y_{2}),\ldots(x_{k},y_{k})$ in $\Gamma_{\infty}$ and $\sigma\in\Sigma(k)$ , we thus get $I_{\infty}\leq\max(0,\widetilde{I}_{\infty})$ . Moreover, since $\widetilde{I}_{\infty}\leq 0$ on $\Gamma_{\infty}$ , $I_{\infty}=\max(0,\widetilde{I}_{\infty})=0$ on $\Gamma_{\infty}$

Lower semi continuity of $I_{\infty}$ and $\widetilde{I}_{\infty}$ follows from the continuity of $c$ . Finally assume that $x\in\operatorname{spt}(\mu)$ and $y\in\mathbb{R}^{d}$ , since $\Gamma_{\infty}=\operatorname{spt}(\gamma_{\infty})$ is compact and $\gamma_{\infty}\in\Pi(\mu,\nu)$ , there exists $y^{\prime}\in\mathbb{R}^{d}$ such that $(x,y^{\prime})\in\Gamma_{\infty}$ . Taking $(x_{1},y_{1})=(x,y)$ , $(x_{2},y_{2})=(x,y^{\prime})$ as a competitor in the definition of $\widetilde{I}_{\infty}(x,y)$ we see that $\widetilde{I}_{\infty}(x,y)\geq 0$ hence $I_{\infty}(x,y)=\widetilde{I}_{\infty}(x,y)$ . The same argument shows that $I_{\infty}$ and $\widetilde{I}_{\infty}$ coincide on $\mathbb{R}^{d}\times\operatorname{spt}(\nu)$ . ∎

Lemma 5.5.

Let us fix $(x,y)\in\mathbb{R}^{d}\times\mathbb{R}^{d}$ . Suppose that for some $\delta\in\mathbb{R}$ , $k\in{\mathbb{N}}$ , $k\geq 2$ and $(x_{i},y_{i})_{i=2}^{k}\subset\operatorname{spt}\gamma_{\infty}$ , we have

[TABLE]

Then there exist $\alpha>0$ , $r>0$ and $p_{0}\geq 1$ such that

[TABLE]

where $\gamma_{p,\varepsilon}$ is the minimizer of $J_{p,\varepsilon}$ .

Proof.

Of course if $\delta\leq 0$ , one can just take $\alpha=1$ so we may assume that $\delta>0$ . Reasoning as in the proof of Theorem 4.4 (recall that we have assumed $c\geq 1$ ), we know that there exist $p_{0}$ and $r>0$ such that

[TABLE]

for every $p\geq p_{0}$ and $(x^{\prime}_{i},y^{\prime}_{i})_{i=1}^{k}\subset B_{r}(x_{1},y_{1})\times\cdots\times B_{r}(x_{k},y_{k})$ . Then $B_{r}(x_{1},y_{1})\times\cdots\times B_{r}(x_{k},y_{k})\subset A_{k,c^{p}}(p\delta)$ so, thanks to Lemma 4.3,

[TABLE]

Moreover $\liminf_{p\to\infty}\gamma_{p,\varepsilon}(B_{r}(x_{i},y_{i}))\geq\gamma_{\infty}(B_{r}(x_{i},y_{i}))>\beta$ , for all $2\leq i\leq k$ , for some $\beta>0$ since $(x_{i},y_{i})_{i=2}^{k}\subset\operatorname{spt}\gamma_{\infty},$ then

[TABLE]

for all $p\geq p_{0}$ (possibly replacing $p_{0}$ with a larger one). ∎

Proposition 5.6.

Under the assumptions of this paragraph, for any compact set $C\subset\mathbb{R}^{d}\times\mathbb{R}^{d}$ , one has

[TABLE]

Proof.

First note that since $\gamma_{p,\varepsilon}$ is supported on $\operatorname{spt}(\mu)\times\operatorname{spt}(\nu)$ ,

[TABLE]

and there is noting to prove if $C$ is disjoint from $\operatorname{spt}(\mu)\times\operatorname{spt}(\nu)$ . Therefore we can assume that $C\cap(\operatorname{spt}(\mu)\times\operatorname{spt}(\nu))\neq\emptyset$ . It then follows from Lemma 5.4 that

[TABLE]

Now let $\eta>0$ and $(x,y)\in C\cap(\operatorname{spt}(\mu)\times\operatorname{spt}(\nu))$ . By definition of $\widetilde{I}_{\infty}(x,y)$ there exist $k\geq 2$ and $(x_{i},y_{i})_{i=2}^{k}\subset\Gamma_{\infty}$ , such that (setting as usual $(x_{1},y_{1})=(x,y)$ and $y_{k+1}=y$ )

[TABLE]

Note that the truncation is used to handle the case where $\widetilde{I}_{\infty}(x,y)=+\infty$ . By Lemma 5.5 we know that there exist $\alpha,r>0$ such that

[TABLE]

Then

[TABLE]

and, by compactness of $C$ ,

[TABLE]

which, letting $\eta\to 0^{+}$ , yields the desired upper bound. ∎

6 Numerical results

In this section, we present several numerical examples, with the aim of illustrating the discussions and theoretical analysis of the previous sections. We shall consider discrete marginals; let $N,M\in{\mathbb{N}}$ , with a slight abuse of notation, we will denote by $\mu$ and $\nu$ both the measures and the vectors of weights $(\mu_{i})_{i=1}^{N}$ and $(\nu_{j})_{j=1}^{M}$ and $\gamma$ will denote both the transport plan and the $N\times M$ matrix $(\gamma^{ij})$ . For fixed $p,\varepsilon>0$ , in this discrete setting, the minimization of $J_{p,\varepsilon}$ reads

[TABLE]

Raising the above cost to the power $p$ , which does not change the minimizer, leads to a standard entropic transport problem. For such problems, we used in all our examples Sinkhorn’s algorithm (see for instance Chapter 4 in [14]) to find a good approximation (with error smaller than $10^{-5}$ ) of the solution.

If $v_{\infty}\geq 1$ , in light of Theorem 3.1, we expect the output $\gamma$ of the Sinkhorn algorithm to be, for suitable $p$ and $\varepsilon$ , also a good approximation of an optimal plan for the discretized $L^{\infty}$ - optimal transport problem

[TABLE]

Furthermore, if $c\geq 1$ , thanks to Theorem 4.4, we expect to find a plan close to an $\infty$ -cyclically monotone one.

*Remark 6.1**.*

As the set of transport plans $\Pi(\mu,\nu)$ is a convex polytope, for any $\gamma\in\Pi(\mu,\nu)$ there exists a finite set of indices $S$ , such that $\gamma=\sum_{s\in S}a_{s}\gamma_{s}$ , with $a_{s}>0$ , $\sum a_{s}=1$ and $\gamma_{s}$ an extreme point of $\Pi(\mu,\nu)$ . If $N=M$ and $\mu_{i}=\nu_{j}=\frac{1}{N}$ , the set $\Pi(\mu,\nu)$ is the set of the so-called bi-stochastic matrices, whose extreme points, by Birkhoff’s theorem, form the set of pemutation matrices. We observe that, by definition of $\gamma-\text{\rm{ess}}\sup$ , $J_{\infty}(\gamma)=\max_{s\in S}J_{\infty}(\gamma_{s})$ and thus the minimum of $J_{\infty}$ is attained at some permutation matrix. Therefore, if $N=M$ and $\mu_{i}=\nu_{j}=\frac{1}{N}$

[TABLE]

This can be in principle used to compute $v_{\infty}$ exactly. However this is not particularly useful in practice; regarding for instance the example on bottom of Figure 4, even if the size of $\mu$ and $\nu$ is the same, in order to calculate the exact value of $v_{\infty}$ we should be able to perform $100!$ evaluations, which is infeasible in practice!

All the examples in this section, will be in dimension $d=2$ , $\mu$ will be represented by blue points, $\nu$ by red points and the plan will be represented by arrows: the black ones indicate that a blue point is sent to a red point with high probability, while the gray ones indicate that a blue point is sent to a red point with lower probability (but still not negligible).

In the first example, as shown by Figure 1, we consider $c^{p}=|x-y|^{p}$ , for $p\in\{2,3,4,5\}$ , $\mu$ which is uniformly concentrated on the blue points

[TABLE]

and $\nu$ on the red points

[TABLE]

Note that with this choice of $\operatorname{spt}\mu$ and $\operatorname{spt}\nu$ , $c\geq 1$ everywhere and therefore, thanks to Theorem 3.1 and Theorem 4.4, $\Gamma$ -convergence and convergence of the outputs towards $\infty$ -cm plans still hold choosing $\varepsilon=1$ . We observe that for $p=2$ , every transport plan $\gamma$ is optimal. Indeed, by the orthogonality of the two supports, any plan is concentrated on a cyclically monotone set (see (4.1)) and, as recalled in Section 5.3 (see for instance [9, 15]), this is a sufficient optimality condition. Here, since we look for a plan which minimizes the regularized problem which involves the entropy, the Sinkhorn algorithm selects the most diffuse one, as evidenced by the picture on the upper left of Figure 1. The other three pictures in Figure 1 show that convergence towards an $\infty$ -cm plan is really fast and it occurs already for $p=5$ .

Regarding the accuracy, Figure 2 shows that for $p=5$ and $\varepsilon=1$ the distance $|\gamma\mathbbm{1}_{4}-\mu|$ between the first marginal of the output $\gamma$ and the distance $|\gamma^{\intercal}\mathbbm{1}_{8}-\nu|$ between the second marginal of $\gamma$ and $\nu$ is of the order of $10^{-5}$ after only $350$ iterations.

We have also considered the same example (see Figure 3) with the cost function $c^{p}(x,y):=\left(\max\{|x_{1}-y_{1}|,|x_{2}-y_{2}|\}\right)^{p}$ . In this case the convergence is still fast and the error is small after few iterations (of order $10^{-5}$ after about $180$ iterations).

*Remark 6.2**.*

When $c>1$ , on the one hand, we don’t need $\varepsilon$ to be small and we can even take it large as $p$ grows (by case 2. in Theorem 3.1 we can even choose for instance $\varepsilon_{p}=(1+\lambda)^{p}$ ). On the other hand, we can encounter some difficulties when computing the Gibbs kernel $K_{ij}=e^{-\frac{c^{p}_{ij}}{\varepsilon}}$ : if $p$ is large it can happen that, for some $i,j$ , $K_{i,j}=0$ making impossible to perform the division in the iterations of the primal version of the Sinkhorn algorithm. Fortunately, this problem can be overcome using the Log-Domain version (see for instance Section 4.4 in [14]), as we did in the following example, represented by Figure 4.

Figure 4, which shows a comparison among three different examples, considered for $p=2$ on the left and for $p=15$ on the right and $\varepsilon=1$ . The two pictures on top in Figure 4 show the representation by arrows of the output when $\mu$ is uniformly concentrated on $400$ points which discretize the unitary square and $\nu$ is uniformly concentrated on the points $(1,2)$ and $(2,1)$ . This is a discretization of the case $\mu$ uniform on the square $[0,1]^{2}$ , where (see also Example 2.2 in [6]) every $\gamma\in\Pi(\mu,\nu)$ is optimal for the problem

[TABLE]

Indeed

[TABLE]

for every $\gamma\in\Pi(\mu,\nu).$ Since every plan is optimal, when $p$ is smaller, as shown in the picture on the left, the role of the entropy is more important and the algorithm selects the most diffuse plan. While increasing the value of $p$ the entropy becomes more and more negligible and output becomes sparser: already for $p=15$ (on the right) the output is a good approximation of the $\infty$ -cyclically monotone plan, which in this case is unique (see Theorem 5.6 in [6]). A small variation, represented by the two figures in the middle, is to consider $\nu$ which is not uniformly concentrated on the points $(1,2)$ and $(2,1)$ . Here we have taken $\nu=0.1\delta_{(1,2)}+0.9\delta_{(2,1)}$ . Finally, on the bottom, we have implemented the case in which also $\nu$ is the discretization of an absolutely continuous measure. Here $\mu$ approximates the square $[-0.25,0.25]\times[-0.25,0.25]$ and $\nu$ the rectangle $[1.25,1.5]\times[-0.5,0.5]$ and both measures are supported on $100$ points. As previously, one can notice that for $p=2$ the entropy plays an important role and the algorithm selects the most diffuse plan, while, already for $p=15$ the plan is considerably sparser.

We are now interested in the asymptotic behavior of $v_{p}:=\min_{\Pi(\mu,\nu)}J_{p}$ and we want to numerically represent the upper and lower bounds on the speed of convergence of $v_{p}$ towards $v_{\infty}:=\min_{\Pi(\mu,\nu)}J_{\infty}$ proved in Proposition 5.1 and Proposition 5.3. In order to apply Proposition 5.1 and Proposition 5.3 it is enough to assume a lower bound on $v_{\infty}$ and not a pointwise one on $c$ .

Figure 5 provides an example of the asymptotic behavior of $v_{p}$ and of the speed of convergence in the case of $\mu$ and $\nu$ as the ones represented in the two pictures on top of Figure 4. In light of what we have just remarked, we have re-scaled the cost $c$ in order to have $v_{\infty}\simeq 1.08166$ . For $p\in[10,206]$ the image on top of Figure 5 shows in blue how $v_{p}$ changes varying $p$ , while $v_{\infty}$ is constant and is represented by the orange line. On bottom of Figure 5 we have represented in blue $v_{p}-v_{\infty}$ , in green the upper bound $Be^{-\beta p}$ and in orange the lower bound $-\frac{A}{p}$ , where $\beta=\log(v_{\infty})$ by Proposition 5.1 (indeed in this case $c$ is Lipschitz so $\alpha=1>\log(v_{\infty})$ ) and $A,B$ have been estimated by a linear regression method (by least squares).

Finally, an example in which it is possible (even if it is really slow!) to compute $v_{\infty}$ exactly (see Remark 6.1) is represented in Figure 6. Here $\mu$ is concentrated on $8$ points, given by

[TABLE]

and $\nu$ is concentrated on $8$ equidistant points of the segment starting from the point $(0.625,1.25)$ to the point $(1.25,0)$ of the line $y_{2}=-2y_{1}+2.5$ . We have computed $v_{\infty}$ for the cost $c(x,y)=|x-y|$ applying Remark 6.1, and we have obtained that $v_{\infty}\simeq 1.38647347$ and that the points which are at the minimal-maximal distance are $x_{*}=(-0.25,-0.1)$ and $y_{*}=(0.98214286,0.53571429)$ , connected by the purple segment in the picture. Regarding the speed of convergence we rescaled the cost in order to decrease further $v_{\infty}\simeq 1.052460609$ . As shown in Figure 7, $v_{p}$ is calculated varying $p$ in the interval $[10,172]$ , with $\varepsilon=500^{2}$ . We observe that in this case, as shown in the picture on top, $v_{p}$ is initially smaller than $v_{\infty}$ , then it increases becoming greater and finally it starts decreasing converging to $v_{\infty}$ .

Acknowledgments: G.C. acknowledges the support of the Lagrange Mathematics and Computing Research Center. The research of C.B. and L.DP is partially financed by the “Fondi di ricerca di ateneo, ex 60 $\%$ ” of the University of Firenze and is part of the project ”Alcuni problemi di trasporto ottimo ed applicazioni” of the GNAMPA-INDAM. C.B. and G.C. also acknowledge the support of the French Agence Nationale de la Recherche through the project MAGA (ANR-16-CE40-0014).

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Mohit Bansil and Jun Kitagawa. 𝒲 ∞ subscript 𝒲 {\cal{W}}_{\infty} -transport with discrete target as a combinatorial matching problem. Arch. Math. (Basel) , 117(2):189–202, 2021.
2[2] Jean-David Benamou. Optimal transportation, modelling and numerical simulation. Acta Numer. , 30:249–325, 2021.
3[3] Camilla Brizzi, Luigi De Pascale, and Anna Kausamo. l ∞ superscript 𝑙 l^{\infty} -optimal transport for a class of quasiconvex cost functions, 2021.
4[4] Guillaume Carlier, Vincent Duval, Gabriel Peyré, and Bernhard Schmitzer. Convergence of entropic schemes for optimal transport and gradient flows. SIAM J. Math. Anal. , 49(2):1385–1418, 2017.
5[5] Guillaume Carlier, Paul Pegon, and Luca Tamanini. Convergence rate of general entropic optimal transport costs, 2022.
6[6] Thierry Champion, Luigi De Pascale, and Petri Juutinen. The ∞ \infty -Wasserstein distance: local solutions and existence of optimal transport maps. SIAM J. Math. Anal. , 40(1):1–20, 2008.
7[7] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems , 26, 2013.
8[8] Marcel Nutz Espen Bernton, Promit Ghosal. Entropic optimal transport: Geometry and large deviations. ar Xiv:2102.04397 , 2021.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Entropic approximation of ∞\infty∞-optimal transport problems

Abstract

1 Introduction

2 Assumptions and notations

3 Γ\GammaΓ-convergence

Theorem 3.1**.**

Proof.

Remark 3.2*.*

Remark 3.3*.*

Definition 3.4**.**

Lemma 3.5**.**

Proof.

4 Selection of plans with ∞\infty∞-cyclically monotone support

Definition 4.1**.**

Definition 4.2**.**

Lemma 4.3**.**

Proof.

Theorem 4.4**.**

Proof.

Remark 4.5*.*

5 Some estimates on the speed of convergence

5.1 Upper bounds

Proposition 5.1** (Upper bounds on the speed of convergence).**

Proof.

5.2 Upper and lower bounds in the discrete case

Lemma 5.2**.**

Proof.

Proposition 5.3** (Lower bound on the speed of convergence, discrete case).**

Proof.

5.3 A large deviations upper bound

Lemma 5.4**.**

Proof.

Lemma 5.5**.**

Proof.

Proposition 5.6**.**

Proof.

6 Numerical results

Remark 6.1*.*

Remark 6.2*.*

Entropic approximation of $\infty$ -optimal transport problems

3 $\Gamma$ -convergence

Theorem 3.1.

*Remark 3.2**.*

*Remark 3.3**.*

Definition 3.4.

Lemma 3.5.

4 Selection of plans with $\infty$ -cyclically monotone support

Definition 4.1.

Definition 4.2.

Lemma 4.3.

Theorem 4.4.

*Remark 4.5**.*

Proposition 5.1 (Upper bounds on the speed of convergence).

Lemma 5.2.

Proposition 5.3 (Lower bound on the speed of convergence, discrete case).

Lemma 5.4.

Lemma 5.5.

Proposition 5.6.

*Remark 6.1**.*

*Remark 6.2**.*