Entropic regularization of continuous optimal transport problems

Christian Clason; Dirk A. Lorenz; Hinrich Mahler; Benedikt; Wirth

arXiv:1906.01333·math.OC·October 28, 2020

Entropic regularization of continuous optimal transport problems

Christian Clason, Dirk A. Lorenz, Hinrich Mahler, Benedikt, Wirth

PDF

TL;DR

This paper provides a comprehensive analysis of entropic regularization in continuous optimal transport, establishing duality, existence of solutions, and convergence results, especially addressing cases with marginals lacking finite entropy.

Contribution

It introduces a novel analysis framework using Orlicz spaces for entropic regularization, proving existence and duality results, and demonstrating Gamma-convergence for non-finite entropy marginals.

Findings

01

Strong duality for the regularized problem in continuous functions.

02

Existence of minimizers in Orlicz space when marginals have finite entropy.

03

Gamma-convergence of regularized solutions to the original problem.

Abstract

We analyze continuous optimal transport problems in the so-called Kantorovich form, where we seek a transport plan between two marginals that are probability measures on compact subsets of Euclidean space. We consider the case of regularization with the negative entropy with respect to the Lebesgue measure, which has attracted attention because it can be solved by the very simple Sinkhorn algorithm. We first analyze the regularized problem in the context of classical Fenchel duality and derive a strong duality result for a predual problem in the space of continuous functions. However, this problem may not admit a minimizer, which prevents obtaining primal-dual optimality conditions. We then show that the primal problem is naturally analyzed in the Orlicz space of functions with finite entropy in the sense that the entropically regularized problem admits a minimizer if and only if the…

Equations186

π \in P (Ω_{1} \times Ω_{2}), (P_{1})_{#} π = μ, (P_{2})_{#} π = ν in f \int_{Ω_{1} \times Ω_{2}} c d π + γ \int_{Ω_{1} \times Ω_{2}} π (lo g π - 1) d (x_{1}, x_{2}) .

π \in P (Ω_{1} \times Ω_{2}), (P_{1})_{#} π = μ, (P_{2})_{#} π = ν in f \int_{Ω_{1} \times Ω_{2}} c d π + γ \int_{Ω_{1} \times Ω_{2}} π (lo g π - 1) d (x_{1}, x_{2}) .

α \in C (Ω_{1}) β \in C (Ω_{2}) sup \int_{Ω_{1}} α (x_{1}) d μ (x_{1}) + \int_{Ω_{2}} β (x_{2}) d ν (x_{2}) - γ \int_{Ω_{1} \times Ω_{2}} exp (\frac{- c ( x _{1} , x _{2} ) + α ( x _{2} ) + β ( x _{1} )}{γ}) d (x_{1}, x_{2}) .

α \in C (Ω_{1}) β \in C (Ω_{2}) sup \int_{Ω_{1}} α (x_{1}) d μ (x_{1}) + \int_{Ω_{2}} β (x_{2}) d ν (x_{2}) - γ \int_{Ω_{1} \times Ω_{2}} exp (\frac{- c ( x _{1} , x _{2} ) + α ( x _{2} ) + β ( x _{1} )}{γ}) d (x_{1}, x_{2}) .

E (f) = \int_{Ω} ∣ f (x) ∣ lo g (∣ f (x) ∣) d x,

E (f) = \int_{Ω} ∣ f (x) ∣ lo g (∣ f (x) ∣) d x,

L lo g L (Ω) := {f : Ω \to R measurable : \int_{Ω} ∣ f (x) ∣ lo g^{+} (∣ f (x) ∣) d x < \infty},

L lo g L (Ω) := {f : Ω \to R measurable : \int_{Ω} ∣ f (x) ∣ lo g^{+} (∣ f (x) ∣) d x < \infty},

Φ (t) := \int_{0}^{t} φ (s) d s,

Φ (t) := \int_{0}^{t} φ (s) d s,

Ψ (s) := t \geq 0 max {s t - Φ (t)}

Ψ (s) := t \geq 0 max {s t - Φ (t)}

∥ f ∥_{Φ} = in f {γ > 0 : \int_{Ω} Φ (\frac{∣ f ∣}{γ}) d x \leq 1} .

∥ f ∥_{Φ} = in f {γ > 0 : \int_{Ω} Φ (\frac{∣ f ∣}{γ}) d x \leq 1} .

∥ f ∥_{Φ} = in f {γ > 0 : \frac{1}{L ( Ω )} \int_{Ω} Φ (\frac{∣ f ∣}{γ}) d x \leq 1} .

∥ f ∥_{Φ} = in f {γ > 0 : \frac{1}{L ( Ω )} \int_{Ω} Φ (\frac{∣ f ∣}{γ}) d x \leq 1} .

\int_{Ω} Φ (\frac{∣ f ∣}{∥ f ∥ _{Φ}}) d x \leq 1

\int_{Ω} Φ (\frac{∣ f ∣}{∥ f ∥ _{Φ}}) d x \leq 1

\frac{1}{γ} \int_{Ω} Φ (∣ u ∣) d x

\frac{1}{γ} \int_{Ω} Φ (∣ u ∣) d x

\geq \int_{Ω} Φ (\frac{1}{γ} ∣ u ∣ + (1 - \frac{1}{γ}) 0) d x = \int_{Ω} Φ (\frac{∣ u ∣}{γ}) d x > 1.

Φ_{e x p} (s) = {s e^{s - 1} if 0 < s \leq 1, if s > 1

Φ_{e x p} (s) = {s e^{s - 1} if 0 < s \leq 1, if s > 1

L^{\infty} (Ω) ↪ L_{e x p} (Ω) ↪ L^{p} (Ω) ↪ L lo g L (Ω) ↪ L^{1} (Ω) .

L^{\infty} (Ω) ↪ L_{e x p} (Ω) ↪ L^{p} (Ω) ↪ L lo g L (Ω) ↪ L^{1} (Ω) .

\int_{Ω_{1} \times Ω_{2}} ∣ μ (x_{1}) ν (x_{2}) ∣ lo g^{+} ∣ μ (x_{1}) ν (x_{2}) ∣ d (x_{1}, x_{2}) \leq \int_{Ω_{1}} ∣ μ (x_{1}) ∣ lo g^{+} ∣ μ (x_{1}) ∣ d x_{1} \int_{Ω_{2}} ∣ ν (x_{2}) ∣ d x_{2} + \int_{Ω_{1}} ∣ μ (x_{1}) ∣ d x_{1} \int_{Ω_{2}} ∣ ν (x_{2}) ∣ lo g^{+} ∣ ν (x_{2}) ∣ d x_{2}

\int_{Ω_{1} \times Ω_{2}} ∣ μ (x_{1}) ν (x_{2}) ∣ lo g^{+} ∣ μ (x_{1}) ν (x_{2}) ∣ d (x_{1}, x_{2}) \leq \int_{Ω_{1}} ∣ μ (x_{1}) ∣ lo g^{+} ∣ μ (x_{1}) ∣ d x_{1} \int_{Ω_{2}} ∣ ν (x_{2}) ∣ d x_{2} + \int_{Ω_{1}} ∣ μ (x_{1}) ∣ d x_{1} \int_{Ω_{2}} ∣ ν (x_{2}) ∣ lo g^{+} ∣ ν (x_{2}) ∣ d x_{2}

(P_{1})_{#} π (x_{1}) = \int_{Ω_{2}} π (x_{1}, x_{2}) d x_{2}, (P_{2})_{#} π (x_{2}) = \int_{Ω_{1}} π (x_{1}, x_{2}) d x_{1} .

(P_{1})_{#} π (x_{1}) = \int_{Ω_{2}} π (x_{1}, x_{2}) d x_{2}, (P_{2})_{#} π (x_{2}) = \int_{Ω_{1}} π (x_{1}, x_{2}) d x_{1} .

∥ (P_{i})_{#} π ∥_{Φ_{l o g}} \leq max (1, L (Ω_{3 - i})) ∥ π ∥_{Φ_{l o g}} .

∥ (P_{i})_{#} π ∥_{Φ_{l o g}} \leq max (1, L (Ω_{3 - i})) ∥ π ∥_{Φ_{l o g}} .

\int_{\Omega_{1}\times\Omega_{2}}\Phi\Big{(}\tfrac{|\pi|}{\gamma}\Big{)}\,{\mathrm{d}}(x_{1},x_{2})\geq{\mathcal{L}}(\Omega_{1})\int_{\Omega_{2}}\Phi\Big{(}\frac{1}{{\mathcal{L}}(\Omega_{1})}\int_{\Omega_{1}}\tfrac{|\pi|}{\gamma}\,{\mathrm{d}}x_{1}\Big{)}\,{\mathrm{d}}x_{2}\geq\int_{\Omega_{2}}\Phi\Big{(}\int_{\Omega_{1}}\tfrac{|\pi|}{\gamma\max(1,{\mathcal{L}}(\Omega_{1}))}\,{\mathrm{d}}x_{1}\Big{)}\,{\mathrm{d}}x_{2}

\int_{\Omega_{1}\times\Omega_{2}}\Phi\Big{(}\tfrac{|\pi|}{\gamma}\Big{)}\,{\mathrm{d}}(x_{1},x_{2})\geq{\mathcal{L}}(\Omega_{1})\int_{\Omega_{2}}\Phi\Big{(}\frac{1}{{\mathcal{L}}(\Omega_{1})}\int_{\Omega_{1}}\tfrac{|\pi|}{\gamma}\,{\mathrm{d}}x_{1}\Big{)}\,{\mathrm{d}}x_{2}\geq\int_{\Omega_{2}}\Phi\Big{(}\int_{\Omega_{1}}\tfrac{|\pi|}{\gamma\max(1,{\mathcal{L}}(\Omega_{1}))}\,{\mathrm{d}}x_{1}\Big{)}\,{\mathrm{d}}x_{2}

∥ π ∥_{Φ_{l o g}}

∥ π ∥_{Φ_{l o g}}

\displaystyle\geq\min\left\{\gamma\geq 0:\int_{\Omega_{2}}\Phi\Big{(}\int_{\Omega_{1}}\tfrac{|\pi|}{\gamma\max(1,{\mathcal{L}}(\Omega_{1}))}\,{\mathrm{d}}x_{1}\Big{)}\,{\mathrm{d}}x_{2}\leq 1\right\}

\displaystyle=\min\left\{\gamma\geq 0:\int_{\Omega_{2}}\Phi\Big{(}\tfrac{{{(P_{2})}_{\#}|}\pi|}{\gamma\max(1,{\mathcal{L}}(\Omega_{1}))}\Big{)}\,{\mathrm{d}}x_{2}\leq 1\right\}

\displaystyle\geq\min\left\{\gamma\geq 0:\int_{\Omega_{2}}\Phi\Big{(}\tfrac{{{(P_{2})}_{\#}\pi}}{\gamma\max(1,{\mathcal{L}}(\Omega_{1}))}\Big{)}\,{\mathrm{d}}x_{2}\leq 1\right\}

= \frac{∥ ( P _{2} ) _{#} π ∥ _{Φ_{l o g}}}{max ( 1 , L ( Ω _{1} ))} .

(α \oplus β) (x_{1}, x_{2}) := α (x_{1}) + β (x_{2}) .

(α \oplus β) (x_{1}, x_{2}) := α (x_{1}) + β (x_{2}) .

∥ α \oplus β ∥_{Φ_{e x p}}

∥ α \oplus β ∥_{Φ_{e x p}}

\displaystyle\geq\min\left\{\gamma\geq 0:{\mathcal{L}}(\Omega_{1})\int_{\Omega_{2}}\Phi\Big{(}\tfrac{\frac{1}{{\mathcal{L}}(\Omega_{1})}\int_{\Omega_{1}}\alpha(x_{1})\,{\mathrm{d}}x_{1}+\beta(x_{2})}{\gamma}\Big{)}\,{\mathrm{d}}x_{2}\leq 1\right\}

\displaystyle\geq\min\left\{\gamma\geq 0:\int_{\Omega_{2}}\Phi\Big{(}\min(1,{\mathcal{L}}(\Omega_{1}))\tfrac{\frac{1}{{\mathcal{L}}(\Omega_{1})}\int_{\Omega_{1}}\alpha(x_{1})\,{\mathrm{d}}x_{1}+\beta(x_{2})}{\gamma}\Big{)}\,{\mathrm{d}}x_{2}\leq 1\right\}

= min (1, L (Ω_{1})) β + \frac{1}{L ( Ω _{1} )} \int_{Ω_{1}} α d x_{1}_{Φ_{e x p}} .

π \in \prob (Ω_{1} \times Ω_{2}), (P_{1})_{#} π = μ, (P_{2})_{#} π = ν in f \int_{Ω_{1} \times Ω_{2}} c d π + γ \int_{Ω_{1} \times Ω_{2}} π (lo g π - 1) d (x_{1}, x_{2})

π \in \prob (Ω_{1} \times Ω_{2}), (P_{1})_{#} π = μ, (P_{2})_{#} π = ν in f \int_{Ω_{1} \times Ω_{2}} c d π + γ \int_{Ω_{1} \times Ω_{2}} π (lo g π - 1) d (x_{1}, x_{2})

α \in C (Ω_{1}) β \in C (Ω_{2}) sup \int_{Ω_{1}} α (x_{1}) d μ (x_{1}) + \int_{Ω_{2}} β (x_{2}) d ν (x_{2}) - γ \int_{Ω_{1} \times Ω_{2}} exp (\frac{- c ( x _{1} , x _{2} ) + α ( x _{2} ) + β ( x _{1} )}{γ}) d (x_{1}, x_{2}),

α \in C (Ω_{1}) β \in C (Ω_{2}) sup \int_{Ω_{1}} α (x_{1}) d μ (x_{1}) + \int_{Ω_{2}} β (x_{2}) d ν (x_{2}) - γ \int_{Ω_{1} \times Ω_{2}} exp (\frac{- c ( x _{1} , x _{2} ) + α ( x _{2} ) + β ( x _{1} )}{γ}) d (x_{1}, x_{2}),

α \in C (Ω_{1}) β \in C (Ω_{2}) sup \int_{Ω_{1}} α d μ + \int_{Ω_{2}} β d ν - γ \int_{Ω_{1} \times Ω_{2}} exp (\frac{- c ( x _{1} , x _{2} ) + α ( x _{2} ) + β ( x _{1} )}{γ}) d (x_{1}, x_{2})

α \in C (Ω_{1}) β \in C (Ω_{2}) sup \int_{Ω_{1}} α d μ + \int_{Ω_{2}} β d ν - γ \int_{Ω_{1} \times Ω_{2}} exp (\frac{- c ( x _{1} , x _{2} ) + α ( x _{2} ) + β ( x _{1} )}{γ}) d (x_{1}, x_{2})

= α \in C (Ω_{1}) β \in C (Ω_{2}) sup \int_{Ω_{1}} α d μ + \int_{Ω_{2}} β d ν + \int_{Ω_{1} \times Ω_{2}} π \geq 0 min (c (x_{1}, x_{2}) - α (x_{1}) - β (x_{2})) π (x_{1}, x_{2}) + γ π (lo g π - 1) d (x_{1}, x_{2})

= α \in C (Ω_{1}) β \in C (Ω_{2}) sup π \in M (Ω_{1} \times Ω_{2}) π \geq 0 min \int_{Ω_{1} \times Ω_{2}} c π + γ π (lo g π - 1) d (x_{1}, x_{2}) + \int_{Ω_{1}} α d (μ - (P_{1})_{#} π) + \int_{Ω_{2}} β d (ν - (P_{2})_{#} π)

= π \in \prob (Ω_{1} \times Ω_{2}) (P_{1})_{#} π = μ, (P_{2})_{#} π = ν min \int_{Ω_{1} \times Ω_{2}} c d π + γ \int_{Ω_{1} \times Ω_{2}} π (lo g π - 1) d (x_{1}, x_{2}),

\tilde{π} = \overset{π}{ˉ} + t [1_{ω_{1} \times ω_{2}} (x_{1}, x_{2}) + κ_{1} κ_{2} 1_{\tilde{ω}_{1} \times \tilde{ω}_{2}} (x_{1}, x_{2}) - κ_{1} 1_{\tilde{ω}_{1} \times ω_{2}} (x_{1}, x_{2}) - κ_{2} 1_{ω_{1} \times \tilde{ω}_{2}} (x_{1}, x_{2})]

\tilde{π} = \overset{π}{ˉ} + t [1_{ω_{1} \times ω_{2}} (x_{1}, x_{2}) + κ_{1} κ_{2} 1_{\tilde{ω}_{1} \times \tilde{ω}_{2}} (x_{1}, x_{2}) - κ_{1} 1_{\tilde{ω}_{1} \times ω_{2}} (x_{1}, x_{2}) - κ_{2} 1_{ω_{1} \times \tilde{ω}_{2}} (x_{1}, x_{2})]

\int_{M} c \tilde{π} d (x_{1}, x_{2}) + γ \int_{M} \tilde{π} lo g \tilde{π} d (x_{1}, x_{2}) \leq \int_{M} c \overset{π}{ˉ} d (x_{1}, x_{2}) + γ \int_{M} \overset{π}{ˉ} lo g \overset{π}{ˉ} d (x_{1}, x_{2}),

\int_{M} c \tilde{π} d (x_{1}, x_{2}) + γ \int_{M} \tilde{π} lo g \tilde{π} d (x_{1}, x_{2}) \leq \int_{M} c \overset{π}{ˉ} d (x_{1}, x_{2}) + γ \int_{M} \overset{π}{ˉ} lo g \overset{π}{ˉ} d (x_{1}, x_{2}),

\int_{M} c \tilde{π} d (x_{1}, x_{2}) = \int_{M} c \overset{π}{ˉ} d (x_{1}, x_{2}) + t C_{0}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\manuscriptlicense\manuscriptcopyright\manuscripteprinttype

arXiv \manuscripteprint1906.01333v3

Entropic regularization of continuous optimal transport problems

Christian Clason Faculty of Mathematics, University of Duisburg-Essen, 45117 Essen, Germany (, \orcid0000-0002-9948-8426)

[email protected]

Dirk A. Lorenz Institute of Analysis and Algebra, TU Braunschweig, 38092 Braunschweig, Germany (, \orcid0000-0002-7419-769X) [email protected]

Hinrich Mahler Institute of Analysis and Algebra, TU Braunschweig, 38092 Braunschweig, Germany (, \orcid0000-0001-9108-549X) [email protected]

Benedikt Wirth Applied Mathematics Münster, University of Münster, Einsteinstraße 62, 48149 Münster, Germany (, \orcid0000-0003-0393-1938)

[email protected]

(2020-06-15)

Abstract

We analyze continuous optimal transport problems in the so-called Kantorovich form, where we seek a transport plan between two marginals that are probability measures on compact subsets of Euclidean space. We consider the case of regularization with the negative entropy with respect to the Lebesgue measure, which has attracted attention because it can be solved by the very simple Sinkhorn algorithm. We first analyze the regularized problem in the context of classical Fenchel duality and derive a strong duality result for a predual problem in the space of continuous functions. However, this problem may not admit a minimizer, which prevents obtaining primal-dual optimality conditions. We then show that the primal problem is naturally analyzed in the Orlicz space of functions with finite entropy in the sense that the entropically regularized problem admits a minimizer if and only if the marginals have finite entropy. We then derive a dual problem in the corresponding dual space, for which existence can be shown by purely variational arguments and primal-dual optimality conditions can be derived. For marginals that do not have finite entropy, we finally show Gamma-convergence of the regularized problem with smoothed marginals to the original Kantorovich problem.

1 Introduction

The Kantorovich formulation of optimal transport is the problem of finding a transport plan that describes how to move some measure onto another measure of the same mass such that a certain cost functional is minimal [22]. Specifically, let $\Omega_{1}$ and $\Omega_{2}$ be two compact subset of ${\mathbb{R}}^{n_{1}}$ and ${\mathbb{R}}^{n_{2}}$ , respectively. For given probability measures $\mu$ on $\Omega_{1}$ and $\nu$ on $\Omega_{2}$ and a continuous cost function $c:\Omega_{1}\times\Omega_{2}\to[0,\infty)$ , the goal is to find a measure $\pi$ on $\Omega_{1}\times\Omega_{2}$ such that the cost $\int_{\Omega_{1}\times\Omega_{2}}c\,{\mathrm{d}}\pi$ is minimal among all $\pi$ that have $\mu$ and $\nu$ as marginals. This problem has been well studied, and we refer to the recent books [34, 32] for an overview. For example, it is known that the problem has a solution $\pi$ and that the support of $\pi$ is contained in the so-called $c$ -superdifferential of a $c$ -concave function on $\Omega_{1}$ , see [2, Thm. 1.13]. (This is sometimes called the fundamental theorem of optimal transport.) In the case where $\Omega_{1}$ and $\Omega_{2}$ are both subsets of ${\mathbb{R}}^{n}$ and where $c(x_{1},x_{2})=|x_{1}-x_{2}|^{2}$ is the squared Euclidean distance, this implies that optimal plans $\pi$ are singular with respect to the Lebesgue measure. Hence, the optimal plan is not a measurable function, and so standard approximation techniques from numerical analysis (e.g. by piecewise constant or piecewise linear functions) are not applicable. This motivates the use of regularization of the continuous problem to obtain approximate solutions that are functions instead of measures, which in turn can be treated by classical discretization techniques in order to solve the regularized problem.

In this work we focus on entropic regularization by adding a multiple of the negative entropy of $\pi$ (with respect to the Lebesgue measure) to the objective function. This forces the optimal plan to be a measure that has a density with respect to the Lebesgue measure. Furthermore, in the discrete setting, this allows to solve the problem numerically by the very simple Sinkhorn algorithm [23, 17, 4].

Notation and problem statement.

To fully state the regularized optimal transport problem, we introduce some notation. By ${\mathcal{M}}(\Omega)$ and ${\mathcal{P}}(\Omega)$ we denote the set of Radon and probability measures on $\Omega\subset{\mathbb{R}}^{n}$ , respectively. The Lebesgue measure will be denoted by ${\mathcal{L}}$ (the set on which it is defined being clear from the context), and integrals with respect to the Lebesgue measure are simply denoted by $\mathrm{d}x$ with the appropriate integration variable $x$ . We write $L^{p}(\Omega,\mathrm{d}\mu)$ for the space of $p$ -integrable functions with respect to the measure $\mu$ but omit the set $\Omega$ if it is clear from the context. If no measure is given, $L^{p}$ always refers to the space with respect to the Lebesgue measure. In the case where the measure $\pi$ has a density with respect to the Lebesgue measure, we will also use $\pi$ for that density. For $\mu\in{\mathcal{M}}(\Omega_{1})$ and $g:\Omega_{1}\to\Omega_{2}$ , we denote by ${{g}_{\#}\pi}$ the pushforward of $\mu$ by $g$ , i.e., the measure on $\Omega_{2}$ defined by ${{g}_{\#}\pi}(B)=\pi(g^{-1}(B))$ for all measurable sets $B\subset\Omega_{2}$ . In particular, we will use the coordinate projections $P_{i}:\Omega_{1}\times\Omega_{2}\to\Omega_{i}$ , $P_{i}(x_{1},x_{2})=x_{i}$ , and the fact that ${{P_{i}}_{\#}\pi}$ is the $i$ th marginal of $\pi\in{\mathcal{M}}(\Omega_{1}\times\Omega_{2})$ . The entropically regularized Kantorovich problem of optimal mass transport between $\mu\in{\mathcal{P}}(\Omega_{1})$ and $\nu\in{\mathcal{P}}(\Omega_{2})$ is then given by

[TABLE]

(Note that we used the negative entropy of $\pi$ with respect to the Lebesgue measure for regularization. One could also consider regularization by adding $\gamma\int_{\Omega_{1}\times\Omega_{2}}\pi(\log\pi-1)\,{\mathrm{d}}\theta$ for some other measure $\theta$ , e.g., the product measure $\mu\otimes\nu$ [18], but we will not pursue this further.) A purely formal application of convex duality then yields the predual problem

[TABLE]

Having a primal and a dual problem, it is now possible to write down the system of Fenchel–Rockafellar extremality conditions and derive and analyze algorithms to solve this system; in fact, this is one of the possible ways of deriving the Sinkhorn algorithm in the discrete case. However, the existence of solutions to (D) – which is necessary to rigorously obtain extremality conditions – is not obvious in the continuous case. As it turns out, neither (P) nor (D) may admit a solution in the considered spaces. As we will show, it is necessary and sufficient to obtain existence of a primal solution for the marginals to be in the Banach space $L\log L$ of functions of finite entropy; correspondingly, a reformulation of the predual problem in the dual space $L_{\exp}$ allows showing existence of a maximizer by purely variational methods. For marginals that are not in $L\log L$ , we show $\Gamma$ -convergence of minimizers of regularized problems with suitably smoothed marginals.

Related work.

The continuous optimal transport problem has been analyzed in the survey paper [24] where the relation to the so-called dynamic Schrödinger problem is made. Another survey [20] presents an existence proof for a reparameterized optimality system based on the convergence analysis for a continuous variant of Sinkhorn’s algorithm (and attributes the proof and the algorithm to Fortet [21]). A detailed overview of the connections between optimal transport, the Schrödinger problem, and the Sinkhorn algorithm from a stochastic control viewpoint is given in the even more recent survey [14]. In [11], primal existence has been shown in the subset of the space of measures which have a density of finite entropy with respect to the Lebesgue measure. Furthermore, [15] analyzes the problem (for unbalanced transport, i.e., for marginals with different mass) in $L^{1}$ and derives a dual formulation in $L^{\infty}$ . However, the question of existence of a solution of the respective dual problem is not answered. In [9], this gap was closed through a contraction argument using the Hilbert metric. More precisely, [9, Thm. 3.1] guarantees the existence of dual solutions in $L^{\infty}$ provided that the feasible set of the dual problem is not empty. Moreover, if a certain constraint qualification holds, the dual optimizers $x$ and $y$ can be shown to satisfy $x(s)+y(t)=\log u_{0}(s,t)\quad\text{a.e.}$ , where $u_{0}$ denotes the optimal primal solution. Here $u_{0}$ , $x$ , and $y$ correspond to $\bar{\pi}$ , $\alpha$ , $\beta$ of (P) and (D) via $u_{0}=e^{\frac{c}{\gamma}}\bar{\pi}$ , $x=\alpha/\gamma$ , and $y=\beta/\gamma$ . A similar result is also stated in the more recent work [13, Thm. 6], which shows the existence of dual optimizers if the marginals are absolutely continuous probability measures. (The relation of $q$ and $\nu$ in [13] to our notation is $q=e^{\frac{-c}{\gamma}}$ and $\nu=\bar{\pi}e^{\frac{c}{\gamma}}$ .) Another approach to prove the existence of unique solutions (even in the multi-marginal case) is presented in [12, Thm 4.3]. The authors show that a certain map is a bijection, which yields existence of dual solutions $\alpha$ and $\beta$ in $L^{\infty}$ if the marginals are functions in $L^{\infty}$ as well. Moreover, in [6] a compactness argument is used to show the existence of a fixed point of the Sinkhorn iteration; in contrast to our work, the entropy penalization there is considered with respect to the product measure of the marginals.

Previous works [6, 9, 12, 13] tackle the problem of existence of dual solutions under various conditions in standard Lebesgue spaces. For marginals of finite entropy, [16, Cor. 3.2] already states that dual solutions exist and satisfy $\bar{\alpha}\in L^{1}(\Omega_{1},\mu)$ and $\bar{\beta}\in L^{1}(\Omega_{2},\nu)$ (in our notation; the notation there uses $Q=\bar{\pi}e^{\frac{-c}{\gamma}}$ , $P_{1}=\mu$ , $a=e^{\frac{-\alpha}{\gamma}}$ , $R=e^{\frac{-c}{\gamma}}{\mathcal{L}}$ , $P_{2}=\nu$ , and $b=e^{\frac{-\beta}{\gamma}}$ ). Note that while the primal solution $Q$ is in $L\log L(\Omega,{\mathcal{L}})$ , the analysis takes place in $L\log L(\Omega,e^{\frac{-c}{\gamma}}{\mathcal{L}})$ . Moreover, as the authors of [9] note, [16] fails to elaborate a crucial step of the argumentation. This gap was closed only later in [8]. None of the mentioned works considered necessary conditions for existence. Finally, [25] analyzes regularization with the $L^{2}$ norm of $\pi$ and derives existence of solutions of the dual problem.

The notion of Orlicz spaces in the context of convex integral functionals has previously been used in [26], where existence of both primal and dual optimizers are covered in a more general setting. More precisely, the spaces used in [26], which are also known as Musielak–Orlicz spaces [28], are a generalization of the Orlicz spaces used here. The setting considered here can be recovered in two different ways: In section 7.3 (a), the above referenced results of [16] are recovered as a special case (where again our case corresponds to choosing $R=e^{\frac{-c}{\gamma}}{\mathcal{L}}$ ). Moreover, choosing $m(z)=e^{\frac{-c(z)}{\gamma}}$ in the second example (titled a variant of the Boltzmann entropy) in section 7.1 gives a problem very similar to the one considered here. The difference lies in the fact that the cost function $c$ is part of the definition of the relevant Musielak–Orlicz spaces in this case, and hence the analysis takes place in different spaces. As the aim of [26] is to weaken the necessary assumptions as much as possible, the overall setting is more abstract, and the proofs rely heavily on the authors previous work [27]. Here we aim for a self-contained, more elementary, treatment of (P).

Regarding $\Gamma$ -convergence, the limit for $\gamma\to 0$ and fixed marginals with densities with finite entropy was considered recently in [11].

Organization.

The next Section 2 recalls statements about functions of finite entropy and the duality of the respective Orlicz space $L\log L$ . In Section 3, we collect and prove (for the sake of completeness) results on the regularized optimal transport problem (P) in the context of duality of continuous functions and measures. In particular, Theorem 3.4 shows that primal solutions exist if and only if the marginals are in the space $L\log L$ . Hence, we analyze the problem in Section 4 in the context of $L\log L$ and $L_{\exp}$ . We show existence and uniqueness of the primal problem in $L\log L$ , derive the dual problem and show existence of solutions for the dual problem in $L_{\exp}$ . We finally show a result on $\Gamma$ -convergence for the combined regularization and smoothing of marginals that do not have finite entropy in Section 5.

2 Review of functions of finite entropy and the space $\scriptstyle L\log L$

Entropic regularization deals with positive integrable functions of finite entropy. These functions are closely connected to the space $L\log L$ , a special case of (Birnbaum–)Orlicz spaces, and hence we collect some facts about this space which are mainly taken from [30, 5, 1]; see also [33]. We consider a compact domain $\Omega\subset{\mathbb{R}}^{n}$ and denote the neg-entropy of a measurable function $f:\Omega\to{\mathbb{R}}$ by

[TABLE]

where we set $0\log 0=0$ as usual. Note that since $s\log s\geq-1/e$ for every $s\geq 0$ , the neg-entropy always lies in the interval $[-{\mathcal{L}}(\Omega)/e,\infty]$ . We say that $f$ has finite entropy if $E(f)<\infty$ . Following [5], we define

[TABLE]

where $\log^{+}(x)=\max(\log(x),0)$ .

Proposition 2.1 ([29, Thm. 1.2]).

A nonnegative measurable function $f$ on a set with finite measure has finite entropy if and only if $f\in L\log L(\Omega)$ .

It turns out that $L\log L(\Omega)$ can be normed such that it becomes a Banach space and that its dual has a natural characterization. In the following, we recall the central constructions and main results based on so-called Young functions.

Definition 2.2 (Young functions).

Let $\varphi:[0,\infty)\to[0,\infty]$ be increasing and lower semi-continuous with $\varphi(0)=0$ . Suppose that $\varphi$ is neither identically zero nor identically infinite on $(0,\infty)$ . Then the function $\Phi$ , defined by

[TABLE]

is said to be a Young function. Moreover, the function $\Psi$ defined by

[TABLE]

is called the complementary Young function of $\Phi$ .

Any Young function is continuous and convex on its domain, and the complementary Young function $\Psi$ is again a Young function. The notion of Young functions gives rise to a generalization of $L^{p}$ spaces through the definition of the so-called Luxemburg norm.

Definition 2.3 (Luxemburg norm and Orlicz spaces).

Let $\Phi$ be a Young function. The Luxemburg norm of a measurable function $f:\Omega\to{\mathbb{R}}$ is defined as

[TABLE]

The space of all measurable functions with finite Luxemburg norm is called Orlicz space and denoted by $L^{\Phi}(\Omega)$ .

Remark 2.4.

General Orlicz norms do not scale in a simple way with the size of the set $\Omega$ . Writing ${\mathbb{1}}_{A}$ for the characteristic function of the set $A\subset\Omega$ , i.e., ${\mathbb{1}}_{A}(x)=1$ if $x\in A$ and [math] else, the $p$ -norm (corresponding to the Young function $\Phi(t)=t^{p}$ ) of ${\mathbb{1}}_{\Omega}$ equals $\|{{\mathbb{1}}_{\Omega}}\|_{p}={\mathcal{L}}(\Omega)^{1/p}$ . For a strictly increasing Young function $\Phi$ , we obtain the more complicated result $\|{{\mathbb{1}}_{\Omega}}\|_{\Phi}=(\Phi^{-1}({\mathcal{L}}(\Omega)^{-1}))^{-1}$ . As a consequence, some results in the following depend on the size of the domain. One could get rid of this dependence by adapting the definition of the norm to, e.g.,

[TABLE]

However, since this definition would be nonstandard, we refrain from doing so.

Moreover, note that

[TABLE]

is always true, but equality may fail to hold. For a counterexample, see, e.g., [33, Example 2.8].

Theorem 2.5 ([1, Thm. 8.10]).

$L^{\Phi}(\Omega)$ * is a Banach space with respect to the Luxemburg norm.*

We will also need the following estimate.

Lemma 2.6.

Let $L^{\Phi}(\Omega)$ denote the Orlicz space with convex Young function $\Phi$ and $u\in L^{\Phi}(\Omega)$ with $\|{u}\|_{\Phi}>1$ . Then $\int_{\Omega}\Phi(|u|)\,{\mathrm{d}}x\geq\|{u}\|_{\Phi}$ .

Proof 2.7.

For any $1\leq\gamma<\|{u}\|_{\Phi}$ , it holds that $\int_{\Omega}\Phi\big{(}\frac{|u|}{\gamma}\big{)}\,{\mathrm{d}}x>1$ . It then follows from the convexity of $\Phi$ and $\Phi(0)=0$ that

[TABLE]

Letting $\gamma\to\|{u}\|_{\Phi}$ , the claim follows.

Note that by Remark 2.4, Lemma 2.6 does not hold for $\|{u}\|_{\Phi}=1$ .

Using $\Phi_{\log}(s)=s\log^{+}s$ as Young function now immediately yields $L\log L(\Omega)=L^{\Phi_{\log}}(\Omega)$ . The complementary Young function

[TABLE]

of $\Phi_{\log}$ now provides a natural way to define the Orlicz space $L^{\Phi_{\exp}}(\Omega)=:L_{\exp}(\Omega)$ . In fact, $L_{\exp}(\Omega)$ is the dual space of $L\log L(\Omega)$ .

Proposition 2.8 ([5, Thm. IV.6.5]).

If $\Omega$ has finite Lebesgue measure, then $L\log L(\Omega)^{*}=L_{\exp}(\Omega)$ (up to equivalence of norms). Moreover, for all $1<p<\infty$ , the following embeddings hold

[TABLE]

The Luxemburg norms (1) on $L^{\Phi}(\Omega)$ are equivalent to the norms defined in [5, Def. IV.6.3] (in [5, Def. IV.6.3], the norms for $L\log L(\Omega)$ and $L_{\exp}(\Omega)$ are dual to each other). The constants in this norm equivalence will in the following generically be denoted by $c_{\Phi}$ . Note that [5, Thm. IV.6.5] is stated for domains with unit Lebesgue measure, but the case of general finite measure follows by a simple rescaling.

We also have the following properties, which follow from Theorem 8.21 b and Theorem 8.19 in [1], respectively, by observing that $\Phi_{\log}$ is so-called $\Delta$ -regular (c.f. [1, Def. 8.7]) but $\Phi_{\exp}$ is not.

Lemma 2.9.

(i)

The space $L\log L(\Omega)$ is separable. 2. (ii)

The spaces $L\log L(\Omega)$ and $L_{\exp}(\Omega)$ are not reflexive.

The following example shows that the desired optimality conditions cannot be derived by simply setting the Gâteaux derivative to zero.

Example 2.10.

$E$ * is not Gâteaux-differentiable on $L\log L((0,1))$ . Indeed, consider $f(x)=\exp(-1/\sqrt{x})$ . Then it holds that $f\in L\log L((0,1))$ (since $f$ is bounded) and hence that $E(f)<\infty$ , but note that the formal Gâteaux derivative $E^{\prime}(f)=\log(f)+1$ is not in $L_{\exp}((0,1))$ . To see this, note that $\log(f)+1=1-\frac{1}{\sqrt{x}}$ is not in $L^{2}((0,1))$ and thus by Proposition 2.8 is not in $L_{\exp}((0,1))$ .*

We next derive a few facts that will be useful for the analysis of the primal and dual regularized optimal transport problems. For the first lemma, we use the elementary fact that for all $a,b>0$ we have $\log^{+}(ab)\leq\log^{+}(a)+\log^{+}(b)$ .

Lemma 2.11.

If $\mu\in L\log L(\Omega_{1})$ , $\nu\in L\log L(\Omega_{2})$ , and $\pi=\mu\otimes\nu$ (i.e., $\pi(x_{1},x_{2}):=\mu(x_{1})\nu(x_{2})$ ), then $\pi\in L\log L(\Omega_{1}\times\Omega_{2})$ .

Proof 2.12.

We simply estimate

[TABLE]

and use that all terms on the right-hand side are finite since $L\log L(\Omega)\subset L^{1}(\Omega)$ .

Next, we consider a function $\pi\in L\log L(\Omega_{1}\times\Omega_{2})$ and its pushforwards under the coordinate projections

[TABLE]

The following result states that these marginals are also in $L\log L$ .

Lemma 2.13.

If $\pi\in L\log L(\Omega_{1}\times\Omega_{2})$ , then ${{(P_{i})}_{\#}\pi}\in L\log L(\Omega_{i})$ for $i\in\{1,2\}$ with

[TABLE]

Proof 2.14.

Using the convexity of $\Phi(s)=s\log^{+}(s)$ and Jensen’s inequality, we obtain

[TABLE]

where we used $\ell\Phi(s/\ell)\geq\Phi(s)$ for $\ell\leq 1$ and $\ell\Phi(s/\ell)\geq\Phi(s/\ell)$ otherwise. Thus we obtain

[TABLE]

The claim for ${{(P_{1})}_{\#}\pi}$ follows similarly.

As a corollary, we obtain a characterization of $L_{\exp}(\Omega)$ on tensor product spaces.

Corollary 2.15.

It holds that $\alpha\in L_{\exp}(\Omega_{1})$ and $\beta\in L_{\exp}(\Omega_{2})$ if and only if $\alpha\oplus\beta\in L_{\exp}(\Omega_{1}\times\Omega_{2})$ , where

[TABLE]

Proof 2.16.

The mapping $(\alpha,\beta)\mapsto\alpha\oplus\beta$ is the adjoint of $\pi\mapsto({{(P_{1})}_{\#}\pi},{{(P_{2})}_{\#}\pi})$ , and hence one implication follows from the fact that $L\log L(\Omega)^{*}=L_{\exp}(\Omega)$ .

For the other implication, we use the Luxemburg norm and Jensen’s inequality with $\Phi\equiv\Phi_{\exp}$ to observe that

[TABLE]

This shows that $\beta$ plus a constant is in $L_{\exp}(\Omega)$ and hence that $\beta$ itself is in $L_{\exp}(\Omega)$ . Arguing similarly for $\alpha$ , we obtain the claim.

3 Fenchel duality in $\scriptstyle{\mathcal{M}}$ and $\scriptstyle{\mathcal{C}}$

In this section, we study the primal and dual problems for entropically regularized mass transport, i.e.,

[TABLE]

and

[TABLE]

using Fenchel duality in the canonical spaces ${\mathcal{M}}(\Omega_{1}\times\Omega_{2})$ and ${\mathcal{C}}(\Omega_{1})\times{\mathcal{C}}(\Omega_{2})$ . Most of the results in this section are classical [16, 9], but we include the results with proofs for the sake of completeness.

We use the general framework as outlined in, e.g., [19, Sec. III.4] or [3, Chap. 9]. All throughout the following, we assume that $\mu\in$ , $\nu\in$ , $c\in{\mathcal{C}}(\Omega_{1}\times\Omega_{2})$ , $\gamma>0$ , and that $\Omega_{1}$ and $\Omega_{2}$ are compact.

We begin with a strong duality result for (P) and (D). A similar result in $L^{1}(\Omega)$ instead of ${\mathcal{M}}(\Omega)$ is [15, Thm. 3.2], but we state the theorem and its proof because we use a slightly different setting.

Proposition 3.1 (strong duality).

The predual problem to (P) is (D), and strong duality holds. Furthermore, if the supremum in (D) is finite, (P) admits a minimizer.

Proof 3.2.

First, by the Riesz–Markov representation theorem, ${\mathcal{M}}(\Omega)$ is the dual space of ${\mathcal{C}}(\Omega)$ for compact $\Omega$ . Furthermore, Slater’s condition is fulfilled with $\alpha,\beta=0$ so that strong duality holds and – assuming a finite supremum – the primal problem (P) possesses a minimizer. In addition, the integrand of the last integral in (D) is normal so that it can be conjugated pointwise [31]. Carrying out the conjugation, we obtain

[TABLE]

which is (P).

Remark 3.3.

Note that Proposition 3.1 does not claim that the supremum is attained, i.e., that the predual problem (D) admits a solution. The proposition should also be compared to [15, Thm. 3.2], which similarly characterizes solutions under the condition that the dual problem attains a maximizer.

In addition, solutions to (D) cannot be unique since we can add and subtract constants to $\alpha$ and $\beta$ , respectively, without changing the functional value. On the other hand, up to such a constant, the functional in (D) is strictly concave, and therefore any solution is uniquely determined by this constant.

We can use this duality argument in combination with the results of Section 2 to address the question of existence of a solution to (P). (Naturally, existence under the stated condition can also be shown using Tonelli’s direct method; here we give a proof based on the already shown convex duality for the sake of conciseness.)

Theorem 3.4.

Problem (P) admits a minimizer $\bar{\pi}$ if and only if $\mu\in L\log L(\Omega_{1})$ and $\nu\in L\log L(\Omega_{2})$ . In this case, the minimizer is unique and lies in $L\log L(\Omega_{1}\times\Omega_{2})$ .

Proof 3.5.

By Proposition 2.1, the energy is bounded if and only if $\bar{\pi}\in L\log L(\Omega_{1}\times\Omega_{2})$ . However, by Lemma 2.13, this is the case only if $\mu={{(P_{1})}_{\#}\bar{\pi}}\in L\log L(\Omega_{1})$ and similarly for $\nu$ . This shows that the conditions are necessary to have a finite energy. For sufficiency, we first note that for $\mu\in L\log L(\Omega_{1})$ and $\nu\in L\log L(\Omega_{2})$ , the tensor product $\pi=\mu\otimes\nu$ is a feasible candidate with finite energy by Lemma 2.11. Thus, the infimum in (P) is finite, and weak duality – which always holds due to the properties of supremum and infimum – shows that the supremum in (D) is finite as well. Existence of a solution for (P) now follows from Proposition 3.1.

Uniqueness and regularity of the minimizer then are a direct consequence of the strict convexity of the entropy and Proposition 2.1.

In case a minimizer exists, we can characterize its support. Here and throughout the rest of the paper, we use the usual shorthand $\{f>\lambda\}$ for the set $\{x\in\Omega:f(x)>\lambda\}$ . We also recall from Remark 2.4 that ${\mathbb{1}}_{A}$ refers to the characteristic function of the set $A$ . The following result can also be found in [9, Thm. 2.7], but the proof there needs a constraint qualification for the primal problem which we do not need in this formulation. We present a full proof for the sake of completeness.

Proposition 3.6.

A minimizer $\bar{\pi}\in L\log L(\Omega_{1}\times\Omega_{2})$ of (P) satisfies $\operatorname{supp}\bar{\pi}=\operatorname{supp}\mu\times\operatorname{supp}\nu$ .

Proof 3.7.

The fact that $\operatorname{supp}\bar{\pi}\subset\operatorname{supp}\mu\times\operatorname{supp}\nu$ follows from the marginal constraints and the nonnegativity of $\bar{\pi}$ . It remains to show that $\operatorname{supp}\bar{\pi}\supset\operatorname{supp}\mu\times\operatorname{supp}\nu$ . For a contradiction, assume there is some $\hat{x}\in(\operatorname{supp}\mu\times\operatorname{supp}\nu)\setminus\operatorname{supp}\bar{\pi}$ , then there exists a radius $r>0$ such that $\bar{\pi}=0$ on each ball $B_{s}(\hat{x})$ with $s<r$ , but $\mu(P_{1}(B_{s}(\hat{x})))>0$ and $\nu(P_{2}(B_{s}(\hat{x})))>0$ . In particular, there exist $\omega_{1}\subset P_{1}(B_{r/2}(\hat{x}))$ and $\omega_{2}\subset P_{2}(B_{r/2}(\hat{x}))$ such that $\mu(\omega_{i})>0$ and ${\mathcal{L}}(\omega_{i})>0$ for $i=1,2$ , but $\bar{\pi}(\omega_{1}\times\omega_{2})=0$ . We may choose $\omega_{1},\omega_{2}$ small enough and $\varepsilon>0$ small enough such that there are $\tilde{\omega}_{1}\subset\Omega_{1}\setminus\omega_{1}$ and $\tilde{\omega}_{2}\subset\Omega_{2}\setminus\omega_{2}$ with nonzero Lebesgue measure and with $\bar{\pi}>\varepsilon$ on $(\tilde{\omega}_{1}\times\omega_{2})\cup(\omega_{1}\times\tilde{\omega}_{2})$ .

Let now $\kappa_{i}:=\frac{{\mathcal{L}}(\omega_{i})}{{\mathcal{L}}(\tilde{\omega}_{i})}$ for $i=1,2$ , $\kappa:={\mathcal{L}}(\omega_{1})\cdot{\mathcal{L}}(\omega_{2})$ , and

[TABLE]

for $0<t<\varepsilon/\min\{\kappa_{1},\kappa_{2}\}$ . Then $\tilde{\pi}$ is feasible. We will now argue that for small enough $t$ we have

[TABLE]

where $M:=(\omega_{1}\cup\tilde{\omega}_{1})\times(\omega_{2}\cup\tilde{\omega}_{2})$ . Note that $\bar{\pi}=\tilde{\pi}$ on $\Omega_{1}\times\Omega_{2}\setminus M$ .

First, consider $\int_{M}c\tilde{\pi}\,{\mathrm{d}}(x_{1},x_{2})$ . Since $c$ is continuous and finite, $\int_{M}c\tilde{\pi}\,{\mathrm{d}}(x_{1},x_{2})-\int_{M}c\bar{\pi}\,{\mathrm{d}}(x_{1},x_{2})$ is finite and hence

[TABLE]

for some constant $C_{0}$ . Now, consider the entropy of $\tilde{\pi}$ . Since $\bar{\pi}=0$ on $\omega_{1}\times\omega_{2}$ , we have

[TABLE]

Using the inequality $f(y)\geq f(x)+f^{\prime}(x)(y-x)$ for convex and differentiable $f$ , we can estimate

[TABLE]

and similarly for $\int_{\tilde{\omega}_{1}\times\omega_{2}}\tilde{\pi}\log\tilde{\pi}\,{\mathrm{d}}(x_{1},x_{2})$ . Again using the above inequality we have

[TABLE]

We obtain

[TABLE]

The right-hand side is of the form $g(t)=\kappa t\log t+h(t)$ with $h$ differentiable at [math]. We can therefore estimate

[TABLE]

for some $C_{1}>0$ big enough and small $t$ .

Combining the estimates for cost and entropy yields

[TABLE]

for $t$ small enough. However, the last term will be negative for $t$ small enough, which shows that $\bar{\pi}$ is not optimal in contradiction to the assumption.

Theorem 3.4 shows that the natural setting for the entropically regularized problem (P) is in fact $L\log L(\Omega)$ rather than ${\mathcal{M}}(\Omega)$ . In the next section, we will prove existence of solutions for a suitable modified dual problem of (P) and justify a pointwise almost everywhere optimality system that can be used as a basis for deriving the Sinkhorn algorithm.

4 Duality in $\scriptstyle L\log L$ and $\scriptstyle L_{\exp}$

In this section, we consider (P) in the space $L\log L(\Omega_{1}\times\Omega_{2})$ . To derive a dual problem in $L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ , we shall perform the variable substitution

[TABLE]

see Fig. 1.

Note that $\Phi$ is convex and $\Psi$ concave and that the function $\Phi$ coincides with the Young function $\Phi_{\exp}$ from (2), which is associated with $L_{\exp}$ .

We now substitute $e^{\alpha/\gamma}=\Phi(u_{1})$ and $e^{\beta/\gamma}=\Phi(u_{2})$ , i.e.,

[TABLE]

which conversely implies that $\alpha=\gamma\log(\Phi(u_{1}))=\gamma\Psi(u_{1})$ and $\beta=\gamma\log(\Phi(u_{2}))=\gamma\Psi(u_{2})$ . Using this substitution, we obtain that

[TABLE]

Instead of the predual problem (D), we thus consider the reformulated problem

[TABLE]

This substitution renders the problem nonconvex but, as we will see, allows to prove existence of solutions.

In the following, we assume that $\mu,\nu\in L\log L(\Omega)$ – as required for existence for the primal problem – and that $c\in{\mathcal{C}}(\Omega_{1}\times\Omega_{2})$ . We also recall that the Luxemburg norms $\|{\cdot}\|_{\Phi_{\exp}}$ and $\|{\cdot}\|_{\Phi_{\log}}$ are equivalent norms on $L_{\exp}(\Omega)$ and $L\log L(\Omega)$ , respectively. Our aim is to apply Tonelli’s direct method to (3) by showing that the functional

[TABLE]

is radially unbounded and lower semi-continuous in the right topology. We first need the following lemma.

Lemma 4.1.

If $\|{v{\mathbb{1}}_{\{v>0\}}}\|_{\Phi_{\exp}}>\max(1,{\mathcal{L}}(\Omega))$ , then

[TABLE]

Proof 4.2.

Set $\gamma_{\varepsilon}=\|{v{\mathbb{1}}_{\{v>0\}}}\|_{\Phi_{\exp}}-\varepsilon$ for some $\varepsilon>0$ such that still $\gamma_{\epsilon}>\max(1,{\mathcal{L}}(\Omega))$ . Then it holds that $\int_{\Omega}\Phi\big{(}\tfrac{v{\mathbb{1}}_{\{v>0\}}}{\gamma_{\epsilon}}\big{)}\,{\mathrm{d}}x>\|{v{\mathbb{1}}_{\{v>0\}}}\|_{\Phi_{\exp}}>\max(1,{\mathcal{L}}(\Omega))$ .

By Jensen’s inequality we have

[TABLE]

Taking logarithms, we deduce that $\gamma_{\varepsilon}<\log\left(\frac{1}{{\mathcal{L}}(\Omega)}\int_{\Omega}\Phi((v+1){\mathbb{1}}_{\{v>0\}})\,{\mathrm{d}}x\right)/\log\frac{e}{\min(1,{\mathcal{L}}(\Omega))}$ , and letting $\varepsilon\to 0$ yields the claim.

We next capture the invariance inherited from (D) as described in Remark 3.3.

Lemma 4.3.

Let $u_{i}\in L_{\exp}(\Omega_{i})$ , $i=1,2$ , with $B(u_{1},u_{2})<\infty$ . If for an arbitrary $K\in{\mathbb{R}}$ we set $\tilde{u}_{1}=\Psi^{-1}(\Psi(u_{1})-K)$ and $\tilde{u}_{2}=\Psi^{-1}(\Psi(u_{1})+K)$ , then $B(\tilde{u}_{1},\tilde{u}_{2})=B(u_{1},u_{2})$ . In particular, by choosing $K$ appropriately, we can always achieve $\int_{\Omega_{1}}\Phi(\tilde{u}_{1})\,{\mathrm{d}}x_{1}=1$ .

Proof 4.4.

Note that $u_{1}>0$ $\mu$ -a.e. and $u_{2}>0$ $\nu$ -a.e. as $B(u_{1},u_{2})<\infty$ . By construction, the same holds for $\tilde{u}_{1}$ and $\tilde{u}_{2}$ . The first statement is now a direct consequence of the invariance of the cost functional in (D) under the mapping $(\alpha,\beta)\mapsto(\alpha-K,\beta+K)$ . For the second statement, first note that $\int_{\Omega_{1}}\Phi(\tilde{u}_{1})\,{\mathrm{d}}x_{1}$ is continuous in $K$ . Moreover,

[TABLE]

so that the assertion follows by the intermediate value theorem.

Remark 4.5.

While $\|\tilde{u}_{1}\|_{\Phi_{\exp}}=1$ implies $\int_{\Omega_{1}}\Phi(\tilde{u}_{1})\,{\mathrm{d}}x_{1}\leq 1$ and $\int_{\Omega_{1}}\Phi(\tilde{u}_{1})\,{\mathrm{d}}x_{1}=1$ implies $\|{\tilde{u}_{1}}\|_{\Phi_{\exp}}\leq 1$ , in general we cannot achieve both equalities simultaneously due to Remark 2.4.

Modulo this invariance we now obtain coercivity.

Lemma 4.6.

Let $u_{1}^{n}$ , $n=1,2,\ldots$ , be a sequence in $L_{\exp}(\Omega_{1})$ such that $\int_{\Omega_{1}}\Phi(u^{n}_{1})=1$ for all $n$ . Then $\|{u_{2}^{n}}\|_{\Phi_{\exp}}\to\infty$ for $n\to\infty$ implies $B(u_{1}^{n},u_{2}^{n})\to\infty$ as $n\to\infty$ .

Proof 4.7.

Without loss of generality we may assume the $u_{2}^{n}$ to be nonnegative, since replacing $u_{2}^{n}$ with its absolute value decreases $B(u_{1}^{n},u_{2}^{n})$ without changing $\|{u_{2}^{n}}\|_{\Phi_{\exp}}$ . Due to $\int_{\Omega_{1}}\Phi(u_{1}^{n})\,{\mathrm{d}}x_{1}=1$ we have $\|{u_{1}^{n}}\|_{\Phi_{\exp}}\leq 1$ and thus

[TABLE]

where $c_{\Phi}$ denotes the generic equivalence constant for the duality from Proposition 2.8. Analogously we obtain

[TABLE]

Hence for $C=\exp(-\max_{\Omega_{1}\times\Omega_{2}}c/\gamma)$ , we have

[TABLE]

Since $\|{u_{2}^{n}}\|_{\Phi_{\exp}}\to\infty$ and $u_{2}^{n}$ is nonnegative, we also have $\|{\max(u_{2}^{n}-1,0)}\|_{\Phi_{\exp}}\to\infty$ as $n\to\infty$ and therefore

[TABLE]

by Lemma 4.1. Now Lemma 2.6 implies that $\int_{\Omega_{2}}\Phi(u^{n}_{2})\,{\mathrm{d}}x\to\infty$ for $n\to\infty$ and therefore that

[TABLE]

which yields the desired contradiction.

Lemma 4.8.

$B$ * is sequentially weakly- $*$ lower semi-continuous on $L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ .*

Proof 4.9.

Let $(u_{1}^{n},u_{2}^{n})\rightharpoonup^{*}(u_{1},u_{2})$ in $L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ . Then we have in particular $(u_{1}^{n},u_{2}^{n})\rightharpoonup(u_{1},u_{2})$ in $L^{p}(\Omega_{1})\times L^{p}(\Omega_{2})$ for any $1\leq p<\infty$ . Since $-\Psi$ is a lower semi-continuous and convex integrand, it thus follows, e.g., by [3, Thm. 13.1.1] that

[TABLE]

and hence that these functionals are weak- $*$ sequentially lower semicontinuous on $L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ .

It remains to show weak- $*$ lower semi-continuity of $\int_{\Omega_{1}\times\Omega_{2}}\Phi(u_{1}(x_{1}))\Phi(u_{2}(x_{2}))e^{-\frac{c(x_{1},x_{2})}{\gamma}}\,{\mathrm{d}}(x_{1},x_{2})$ . For fixed $N>0$ , decompose $\Omega_{1}$ and $\Omega_{2}$ into a finite number of subsets $\Omega_{i}^{k}$ with ${\mathcal{L}}(\Omega_{i}^{k})\leq\frac{1}{N}$ . We further assume that the decompositions $(\Omega_{1}^{k})_{k}$ and $(\Omega_{2}^{k})_{k}$ for $N+1$ are obtained from the decompositions for $N$ by refinement. Defining $c_{kl}=\min_{(x_{1},x_{2})\in\Omega_{1}^{k}\times\Omega_{2}^{l}}e^{-c(x_{1},x_{2})/\gamma}$ , we then have

[TABLE]

Similarly as above, it follows from the lower semi-continuity and convexity of $\Phi$ that $u\mapsto\int_{\Omega_{1}^{k}}\Phi(u)\,{\mathrm{d}}x_{1}$ and $v\mapsto\int_{\Omega_{2}^{l}}\Phi(v)\,{\mathrm{d}}x_{2}$ are sequentially weakly- $*$ lower semi-continuous on $L_{\exp}(\Omega_{1}^{k})$ and $L_{\exp}(\Omega_{2}^{l})$ , respectively. Hence

[TABLE]

by the monotone convergence theorem, since $\sum_{k,l}c_{kl}{\mathbb{1}}_{\Omega_{1}^{k}\times\Omega_{2}^{l}}\nearrow e^{-c/\gamma}$ monotonically.

Theorem 4.10 (dual existence).

Problem (3) possesses a maximizer $(\bar{u}_{1},\bar{u}_{2})\in L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ .

Proof 4.11.

We show that $B$ possesses a minimizer. The energy $B$ is finite at, e.g., $u_{1}\equiv 1\equiv u_{2}$ . We thus may consider a minimizing sequence $(u_{1}^{n},u_{2}^{n})$ in $L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ , where by Lemma 4.3 we may assume $\int_{\Omega_{1}}\Phi_{\exp}(u^{n}_{1})\,{\mathrm{d}}x_{1}=1$ without loss of generality. Lemma 4.6 now implies boundedness of $\|{u_{2}^{n}}\|_{\Phi_{\exp}}$ so that by the Banach–Alaoglu theorem we may extract a weakly- $*$ convergent subsequence from $(u_{1}^{n},u_{2}^{n})$ (recalling that $L\log L(\Omega_{1}\times\Omega_{2})$ is separable by Lemma 2.9). The claim now follows from the lower semi-continuity of $B$ along that subsequence by Lemma 4.8.

From dual solutions $\bar{u}_{1}$ and $\bar{u}_{2}$ , we obtain by backsubstitution $\bar{\alpha}:=\gamma\Psi(\bar{u}_{1})$ and $\bar{\beta}:=\gamma\Psi(\bar{u}_{2})$ as a candidate for a solution of the original predual problem (D). However, these are in general not admissible since $\bar{u}_{1}\in L_{\exp}(\Omega_{1})$ and $\bar{u}_{2}\in L_{\exp}(\Omega_{2})$ does not imply the needed regularity of $\bar{\alpha}\in{\mathcal{C}}(\Omega_{1})$ and $\bar{\beta}\in{\mathcal{C}}(\Omega_{2})$ : The positive parts of $\bar{\alpha}$ and $\bar{\beta}$ (which equal the positive parts of $\bar{u}_{1}+1$ and $\bar{u}_{2}+1$ , respectively) are in $L_{\exp}$ , but the negative parts need not even be functions as they could be $-\infty$ everywhere.

Nevertheless, from (3) one sees that $\bar{u}_{1}\geq 0$ $\mu$ -almost everywhere and $\bar{u}_{2}\geq 0$ $\nu$ -almost everywhere, and hence $\bar{u}_{1}$ and $\bar{u}_{2}$ are at least $\mu$ - and $\nu$ -measurable, respectively. We will derive more information on $\bar{\alpha}$ and $\bar{\beta}$ from the necessary optimality conditions.

First, we have again a strong duality result relating (3) to (P).

Proposition 4.12 (strong duality).

Let $\mu\in L\log L(\Omega_{1})$ , $\nu\in L\log L(\Omega_{2})$ , and $c\in{\mathcal{C}}(\Omega_{1}\times\Omega_{2})$ . Then, both (P) and (3) admit a solution, and their optimal values coincide.

Proof 4.13.

Existence for both problems follows from Theorems 3.4 and 4.10. To show their equality, by Proposition 3.1 it suffices to show that the value of (D) equals that of (3). First, let $\alpha\in{\mathcal{C}}(\Omega_{1})$ and $\beta\in{\mathcal{C}}(\Omega_{2})$ be arbitrary and set $u_{1}:=\Psi^{-1}(\alpha/\gamma)$ and $u_{2}:=\Psi^{-1}(\beta/\gamma)$ . By substitution, we see that

[TABLE]

and taking the supremum over all $\alpha,\beta$ yields that the value of (D) is at most that of (3).

It thus remains to show that the value of (3) can be achieved by (D). Let $\bar{u}_{1},\bar{u}_{2}$ be optimal. By the monotone convergence theorem, $B(\bar{u}_{1},\bar{u}_{2})=\lim_{n\to\infty}B(\max\{\bar{u}_{1},\frac{1}{n}\},\max\{\bar{u}_{1},\frac{1}{n}\})$ and also

[TABLE]

Hence $B(\bar{u}_{1},\bar{u}_{2})$ can be arbitrarily well approximated by $B(u_{1},u_{2})$ with $\Psi(u_{1})\in L^{\infty}(\Omega_{1})$ and $\Psi(u_{2})\in L^{\infty}(\Omega_{2})$ . Now let $\alpha_{n}\in{\mathcal{C}}(\Omega_{1})$ and $\beta_{n}\in{\mathcal{C}}(\Omega_{2})$ with $\alpha_{n}\to\gamma\Psi(u_{1})$ in $L^{2}(\Omega_{1})$ and $\beta_{n}\to\gamma\Psi(u_{2})$ in $L^{2}(\Omega_{2})$ . Here we may assume $\alpha_{n},\beta_{n}$ to be uniformly bounded so that (upon restricting to a subsequence) we additionally have $\alpha_{n}\rightharpoonup^{*}\gamma\Psi(u_{1})$ and $\beta_{n}\rightharpoonup^{*}\gamma\Psi(u_{2})$ in $L^{\infty}(\Omega)$ . Now

[TABLE]

due to the weak- $*$ convergence. Finally, as $\alpha_{n},\beta_{n}$ converge in $L^{2}$ , $e^{\frac{\alpha_{n}(x_{1})+\beta_{n}(x_{2})}{\gamma}}$ converges a.e. (after passing to a subsequence). Using uniform boundedness of $\alpha_{n},\beta_{n}$ , the dominated convergence theorem yields

[TABLE]

Having established primal and dual existence, we can now show how the solution of the dual problem can be used to solve the primal problem.

Theorem 4.14 (optimality conditions).

Let $\mu\in L\log L(\Omega_{1})$ , $\nu\in L\log L(\Omega_{2})$ , and $c\in{\mathcal{C}}(\Omega_{1}\times\Omega_{2})$ . Then solutions $(\bar{u}_{1},\bar{u}_{2})\in L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ of (3) satisfy

[TABLE]

for $\mu$ -almost every $x_{1}\in\Omega_{1}$ and $\nu$ -almost every $x_{2}\in\Omega_{2}$ . Furthermore, $\bar{\pi}$ defined by

[TABLE]

is the solution of (P).

Proof 4.15.

Let $\bar{u}_{1},\bar{u}_{2}$ be solutions of the dual problem. We start with deriving the necessary conditions (5). First, note that $\{\bar{u}_{1}>0\}\supset\{\mu>0\}$ and $\{\bar{u}_{2}>0\}\supset\{\nu>0\}$ (up to a Lebesgue-negligible set) since otherwise $\int_{\Omega_{1}}\Psi(\bar{u}_{1})\mu\,{\mathrm{d}}x_{1}+\int_{\Omega_{2}}\Psi(\bar{u}_{2})\nu\,{\mathrm{d}}x_{2}=-\infty$ . Let now $\varepsilon>0$ be arbitrary and consider any $\varphi\in L_{\exp}(\Omega_{1})\cap L^{\infty}(\Omega_{1})$ with $\varphi=0$ on $\{\bar{u}_{1}<\varepsilon\}$ . We next argue that the dual functional $B$ given in (4) is directionally differentiable in $(\bar{u}_{1},\bar{u}_{2})$ with respect to its first argument in direction $\varphi$ . Since both $s\mapsto\Phi(s)$ and $s\mapsto\Psi(s)$ are differentiable at $s>0$ , so are the integrands pointwise almost everywhere on $\{\bar{u}_{1}\geq\varepsilon\}\times\Omega_{2}$ . It therefore suffices to show that the pointwise directional derivatives are integrable in order to differentiate under the integral. For the first term in $B$ , we have almost everywhere on $\{\bar{u}_{1}\geq\varepsilon\}\times\Omega_{2}$ that

[TABLE]

which is integrable on $\{\bar{u}_{1}\geq\varepsilon\}\times\Omega_{2}$ since $\bar{u}_{1}$ and $\bar{u}_{2}$ are feasible for (3). An integrable lower bound is obtained similarly using $\varphi\in L^{\infty}(\Omega_{1})$ .

For the second term in $B$ , the chain rule and differentiability of $\Phi$ yields almost everywhere on $\{\bar{u}_{1}\geq\varepsilon\}\times\Omega_{2}$ that

[TABLE]

where the right-hand side is integrable with respect to $\mathrm{d}\mu$ .

From the dominated convergence theorem, it thus follows that the partial directional derivative of $B$ in the first direction is given by

[TABLE]

where we have again used the integrability of the integrand to apply Fubini’s Theorem in order to iterate the double integrals and used $\Phi^{\prime}(s)=\max\{\Phi(s),1\}$ and $\Psi^{\prime}(s)=\Phi(s)^{-1}\Phi^{\prime}(s)$ .

By the specific choice of $\varphi$ , we have $\bar{u}_{1}\pm t\varphi\geq 0$ for all $t>0$ sufficiently small. The optimality of $(\bar{u}_{1},\bar{u}_{2})$ thus implies that

[TABLE]

and since $\varphi$ was arbitrary on $\{\bar{u}_{1}\geq\varepsilon\}$ and $\max\{\Phi(\bar{u}_{1}),1\}>0$ , we must therefore have that

[TABLE]

Furthermore, since $\varepsilon>0$ was arbitrary and $\mu(x_{1})=0$ whenever $\bar{u}_{1}(x_{1})=0$ , this equation even holds for $\mu$ -almost all $x_{1}\in\Omega_{1}$ , which yields (5a). Equation (5b) is derived analogously.

Now we show that $\bar{\pi}$ defined by (6) is a solution of the primal problem. First note that by construction, $\bar{\pi}$ is feasible (i.e., is non-negative and has the correct marginals). Since strong duality holds by Proposition 4.12, it thus suffices to show that the primal objective functional evaluated in $\bar{\pi}$ is equal to the dual optimal objective value (3). To that end, we insert (6) into the objective functional in (P) and obtain (using again the convention that $0\log 0=0$ )

[TABLE]

Since $\bar{u}_{i}\geq 0$ and hence $\Phi(\bar{u}_{i})\geq 0$ , we have that $\Psi(\bar{u}_{i})\Phi(\bar{u}_{i})=\log(\Phi(\bar{u}_{i}))\Phi(\bar{u}_{i})\geq-\frac{1}{e}$ . Furthermore, we have assumed ${\mathcal{L}}(\Omega_{i})<\infty$ and can thus shift the integrand to allow applying Tonelli’s Theorem in the second and third integral. Inserting (5), the right-hand side now coincides with $B(\bar{u}_{1},\bar{u}_{2})$ . Hence strong duality holds for $(\bar{u}_{1},\bar{u}_{2})$ and $\bar{\pi}$ , and thus the latter is a solution to (P).

Remark 4.16.

The optimality system (5) can be used to derive the Sinkhorn algorithm. First, note that one only needs to find $\bar{u}_{1}$ and $\bar{u}_{2}$ that solve (5a) and (5b); an optimal plan $\bar{\pi}$ is then obtained from (6). The Sinkhorn method now solves the nonlinear system (5) by alternatingly solving the equations: Given $u_{2}^{n}$ , compute $u_{1}^{n+1}$ by solving (5a), i.e., setting

[TABLE]

and then solve (5b) with $u_{2}^{n+1}$ to obtain

[TABLE]

Formulating this iteration directly in $\Phi(u_{1})$ and $\Phi(u_{2})$ , we obtain the original Sinkhorn algorithm, cf. [20, Sec. 5.3.1].

Remark 4.17.

The optimality system (5) also corresponds to the so-called Schrödinger system [14, Eq. (4.12)–(4.13) or (4.14)], i.e., the system of equations which characterizes the solution to the so-called Schrödinger bridge problem (essentially, the most likely transition path of a hot gas between the initial and final gas distribution $\mu$ and $\nu$ ). Existence of solutions to that system was typically shown based on iterative approximation schemes (analogous to but predating the Sinkhorn algorithm; see the discussion in [14]). There are also alternative proofs exploiting the variational nature of the problem; however, these are not as straightforward as identifying (5) as the optimality conditions to an optimization problem which has a solution. In [7], for example, a minimizing sequence for the dual problem (D) is used to construct a sequence of measures of the type (6) that is then shown to converge to a solution of the Schrödinger bridge problem.

Finally, the optimality conditions (5a) and (5b) allow us to conclude which problem is solved by $(\bar{\alpha},\bar{\beta})$ .

Corollary 4.18.

Let $\mu\in L\log L(\Omega_{1})$ , $\nu\in L\log L(\Omega_{2})$ , and $c\in{\mathcal{C}}(\Omega_{1}\times\Omega_{2})$ . Let $(\bar{u}_{1},\bar{u}_{2})\in L_{\exp}(\Omega_{1})\times L_{\exp}(\Omega_{2})$ be a solution of (3). Then $\bar{\alpha}:=\gamma\Psi(\bar{u}_{1})\in L^{1}(\Omega_{1},\mu)$ and $\bar{\beta}:=\gamma\Psi(\bar{u}_{2})\in L^{1}(\Omega_{2},\nu)$ are solutions of

[TABLE]

and the values of (D ${}_{L^{1}}$ ) and (3) coincide.

Proof 4.19.

First, note that the mapping $x_{1}\mapsto\int_{\Omega_{2}}\Phi(\bar{u}_{2}(x_{2}))e^{-\frac{c(x_{1},x_{2})}{\gamma}}\,{\mathrm{d}}x_{2}$ is continuous and thus attains a minimum $\underline{c}>0$ and a maximum $\overline{c}>0$ on the (assumed to be) compact set $\Omega_{1}$ . From the optimality condition (5a), we thus obtain that

[TABLE]

This implies that $\log\mu-K\leq\bar{\alpha}/\gamma\leq\log\mu+K$ for some $K>0$ . We thus have

[TABLE]

Since $\mu\in L\log L(\Omega_{1})$ , we deduce that the right-hand side is finite and hence that $\bar{\alpha}$ is integrable with respect to $\mu$ , i.e., $\bar{\alpha}\in L^{1}(\Omega_{1},\mu)$ . The result for $\bar{\beta}$ follows analogously. Finally, it follows from a density argument that (D ${}_{L^{1}}$ ) cannot exceed (3). Indeed, assume there are $\alpha\in L^{1}(\Omega_{1},\mu)$ and $\beta\in L^{1}(\Omega_{2},\nu)$ with an objective functional value $C$ strictly larger than (3). By invoking the monotone convergence theorem as in the proof of Proposition 4.12, we may assume without loss of generality that $\alpha$ and $\beta$ are bounded. Defining now $u_{1}=\Psi^{-1}(\frac{\alpha}{\gamma})\in L^{\infty}(\Omega_{1})\subset L_{\exp}(\Omega_{1})$ and $u_{2}=\Psi^{-1}(\frac{\beta}{\gamma})\in L^{\infty}(\Omega_{2})\subset L_{\exp}(\Omega_{2})$ shows that (3) is no smaller than $C$ , the desired contradiction.

Remark 4.20.

As for (D) and as formalized in Lemma 4.3, solutions to (3) are not unique.

5 $\scriptstyle\Gamma$ -limit

We now turn to $\Gamma$ -convergence of the regularized problem. Recall from, e.g., [10], that a sequence $\{F_{n}\}$ of functionals $F_{n}:X\to\overline{\mathbb{R}}$ on a metric space $X$ is said to $\Gamma$ -converge to a functional $F:X\to\overline{\mathbb{R}}$ , written $F=\operatorname*{\Gamma\text{-}\lim}_{n\to\infty}F_{n}$ , if

(i)

for every sequence $\{x_{n}\}\subset X$ with $x_{n}\to x$ ,

[TABLE] 2. (ii)

for every $x\in X$ , there is a sequence $\{x_{n}\}\subset X$ with $x_{n}\to x$ and

[TABLE]

It is a straightforward consequence of this definition that if $F_{n}$ $\Gamma$ -converges to $F$ and $x_{n}$ is a minimizer of $F_{n}$ for every $n\in{\mathbb{N}}$ , then every cluster point of the sequence $\{x_{n}\}$ is a minimizer to $F$ . Furthermore, $\Gamma$ -convergence is stable under perturbations by continuous functionals.

Here we aim to approximate optimal transport plans $\bar{\pi}$ of the unregularized problem for marginals $\mu$ and $\nu$ which are not required to be in $L\log L(\Omega)$ , i.e., we allow arbitrary measures as marginals. In this case we cannot use these marginals for the regularized problems as well, since these will admit no solutions by Theorem 3.4. We therefore consider smoothed marginals $\mu_{\gamma}$ and $\nu_{\gamma}$ in $L\log L(\Omega)$ converging to $\mu$ and $\nu$ , respectively, and show that the regularized problem with these marginals $\Gamma$ -converges to the unregularized problem with the original marginals. The conceptually different case of $\Gamma$ -convergence for fixed, non-mollified marginals (which then, however, need to be of finite entropy) has been treated in [11, Thm. 2.7]. Our setting with smoothed marginals allows simpler constructions in the $\limsup$ -inequality since a given transport plan is merely approximated via mollification. A further difference to [11, Thm. 2.7] is that we work on a compact set $\Omega$ instead of $\mathbb{R}^{n}$ and need to couple the smoothing parameter to the regularization parameter to obtain $\Gamma$ -convergence.

Let $B$ be a smooth, compactly supported, nonnegative kernel with unit integral, and for $\delta>0$ and $n\in{\mathbb{N}}$ set

[TABLE]

Since we will smooth the marginals and the transport plans by convolutions, we will need to slightly extend the domains $\Omega_{1}$ and $\Omega_{2}$ to avoid boundary effects. Hence, let $\tilde{\Omega}_{1}$ and $\tilde{\Omega}_{2}$ be compact supersets of $\Omega_{1}$ and $\Omega_{2}$ , respectively, such that

[TABLE]

and which are large enough to contain the supports of $\mu_{\delta}:=\mu\ast B_{\delta}^{n_{1}}$ and $\nu_{\delta}:=\nu\ast B_{\delta}^{n_{2}}$ for $\delta\leq 1$ . (Here and in the following, we assume that the width of the convolution kernels will be small enough.) For a function or measure $f$ on $\Omega_{1}$ , we denote by $\tilde{f}$ the extension of $f$ to $\tilde{\Omega}_{1}$ by zero (and analogously for functions and measures on $\Omega_{2}$ and $\Omega_{1}\times\Omega_{2}$ ). Let $\hat{c}$ be a continuous extension of $c$ onto $\tilde{\Omega}_{1}\times\tilde{\Omega}_{2}$ and set

[TABLE]

Using smoothed marginals $\mu_{\delta},\nu_{\delta}$ and coupling $\gamma$ and $\delta$ in an appropriate way, we can then show $\Gamma$ -convergence of $E^{\mu_{\delta},\nu_{\delta}}_{\gamma}$ to $E_{0}^{\mu,\nu}$ as $\gamma,\delta\to 0$ .

Theorem 5.1.

Let $\mu\in$ , $\nu\in$ , and $\gamma,\delta>0$ be such that

[TABLE]

which is denoted in the following by $(\gamma,\delta)\to 0$ . Define $\mu_{\delta}=B_{\delta}^{n_{1}}\ast\tilde{\mu}$ and $\nu_{\delta}=B_{\delta}^{n_{2}}\ast\tilde{\nu}$ . Then it holds that

[TABLE]

with respect to weak- $*$ convergence in ${\mathcal{M}}(\tilde{\Omega}_{1}\times\tilde{\Omega}_{2})$ .

On the other hand, if $\gamma,\delta\to 0$ are chosen such that $\gamma\|{\mu_{\delta}}\|_{\Phi_{\log}}\to\infty$ or $\gamma\|{\nu_{\delta}}\|_{\Phi_{\log}}\to\infty$ , then $E^{\mu_{\delta},\nu_{\delta}}_{\gamma}$ does not have a finite $\Gamma$ -limit. More precisely, even for a family of feasible $\pi_{\delta}$ (i.e., with marginals $\mu_{\delta}$ and $\nu_{\delta}$ ) it holds that

[TABLE]

Proof 5.2.

For the first statement, we verify the two conditions in the definition of $\Gamma$ -convergence.

(i):* Let $\pi_{\delta}\rightharpoonup^{*}\tilde{\pi}$ , then $\lim_{\delta\to 0}F_{0}[\pi_{\delta}]=F_{0}[\tilde{\pi}]$ since $\hat{c}$ is continuous and bounded. Since $t(\log t-1)\geq-1$ , we also have that*

[TABLE]

and thus that

[TABLE]

Finally, the condition on the marginals is continuous with respect to weak- $*$ convergence of $\pi_{\delta}$ , $\mu_{\delta}$ , and $\nu_{\delta}$ (note that $\mu_{\delta},\nu_{\delta}\rightharpoonup^{*}\tilde{\mu},\tilde{\nu}$ ).

(ii):* It suffices to consider a recovery sequence for $\pi\in$ , because the marginal conditions for $\mu$ and $\nu$ can never be satisfied for $\pi\in$ . If $E_{0}^{\mu,\nu}[\tilde{\pi}]=\infty$ , then the $\limsup$ condition holds trivially. Let therefore $E_{0}^{\mu,\nu}[\tilde{\pi}]$ be finite. We set $\pi_{\delta}:=G_{\delta}\ast\tilde{\pi}$ . Then $\pi_{\delta}\rightharpoonup^{*}\tilde{\pi}$ as well as ${{(P_{1})}_{\#}\pi_{\delta}}=\mu_{\delta}$ , ${{(P_{2})}_{\#}\pi_{\delta}}=\nu_{\delta}$ . Since by Young’s convolution inequality $\pi_{\delta}\leq\|{G_{\delta}}\|_{L^{\infty}}\|{\tilde{\pi}}\|_{L^{1}}\leq\frac{C}{\delta^{N}}$ for some constant $C>0$ and $N:=n_{1}+n_{2}$ and we have*

[TABLE]

we conclude that

[TABLE]

The right-hand side vanishes for $(\gamma,\delta)\to 0$ by the assumption on the (coupled) convergences of $\gamma$ and $\delta$ . Hence,

[TABLE]

For the second statement, recall from Lemma 2.13 that

[TABLE]

so that $\gamma\|{\pi_{\delta}}\|_{\Phi_{\log}}\to\infty$ . By Lemma 2.6, this immediately yields $\gamma\int_{\tilde{\Omega}_{1}\times\tilde{\Omega}_{2}}\pi_{\delta}\log^{+}\pi_{\delta}\,{\mathrm{d}}(x_{1},x_{2})\to\infty$ , which implies

[TABLE]

and thus $F_{\gamma}[\pi_{\delta}]\to\infty$ so that the assertion follows.

The conditions on $\gamma$ and $\delta$ are in particular satisfied for $\delta=c\gamma$ for some $c>0$ .

6 Conclusion

In contrast to the original Kantorovich formulation of optimal transport problems, their entropic regularization is well-posed only for marginals with finite entropy. Restricting the regularized problem to such functions and applying Fenchel duality in the space $L\log L(\Omega)$ allows deriving primal-dual optimality conditions that can be interpreted pointwise almost everywhere and used to derive a continuous version of the popular Sinkhorn algorithm. For marginals that do not have finite entropy, a combined regularization and smoothing approach leads to a family of well-posed approximations that $\Gamma$ -converge to the original Kantorovich formulation if the regularization and smoothing parameters are coupled in an appropriate way.

This work can be extended in several directions. For example, we have considered the usual setting where the entropic penalty is taken with respect to Lebesgue density. More general penalties have been considered in a different framework in [26], and other choices (such as the product measure of the marginals) are possible in the approach considered here as well and may lead to well-posedness and duality for a larger class of marginals. Naturally, a challenging but worthwhile issue would be a convergence analysis of the Sinkhorn algorithm in the considered Orlicz spaces $L\log L(\Omega)$ and $L_{\exp}(\Omega)$ .

Acknowledgments

Dirk Lorenz, Hinrich Mahler, and Benedikt Wirth acknowledge support by the German Research Foundation (DFG) within the priority program “Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization” (SPP 1962) under grant numbers LO 1436/9-1 and WI 4654/1-1. Benedikt Wirth was further supported by the Alfried Krupp Prize for Young University Teachers awarded by the Alfried Krupp von Bohlen und Halbach-Stiftung and by the DFG via Germany’s Excellence Strategy through the Cluster of Excellence “Mathematics Münster: Dynamics – Geometry – Structure” (EXC 2044) at the University of Münster.

The authors would also like to thank the anonymous reviewers for a number of useful comments and suggestions regarding the presentation.

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. A.Adams, Sobolev Spaces , volume 140 of Pure and Applied Mathematics, Academic Press, Inc., Boston, MA, second edition, 2003.
2[2] L.Ambrosio and N.Gigli, A user’s guide to optimal transport, in Modelling and Optimisation of Flows on Networks , Springer, 2013, 1–155, doi:10.1007/978-3-642-32160-3_1 . · doi ↗
3[3] H.Attouch, G.Buttazzo, and G.Michaille, Variational Analysis in Sobolev and BV Spaces , volume 6 of MPS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics (SIAM), 2006, doi:10.1137/1.9781611973488 . · doi ↗
4[4] J. D.Benamou, G.Carlier, M.Cuturi, L.Nenna, and G.Peyré, Iterative Bregman projections for regularized transportation problems, SIAM Journal on Scientific Computing 37 (2015), A 1111–A 1138, doi:10.1137/141000439 . · doi ↗
5[5] C.Bennett and R.Sharpley, Interpolation of Operators , volume 129 of Pure and Applied Mathematics, Academic Press, Inc., Boston, MA, 1988, doi:10.1016/s 0079-8169(13)62909-8 . · doi ↗
6[6] R. J.Berman, The Sinkhorn algorithm, parabolic optimal transport and geometric Monge–Ampère equations, 2017.
7[7] A.Beurling, An automorphism of product measures, Annals of Mathematics 72 (1960), 189–200, doi:10.2307/1970151 . · doi ↗
8[8] J. M.Borwein and A. S.Lewis, Decomposition of multivariate functions, Canadian Journal of Mathematics 44 (1992), 463–482, doi:10.4153/cjm-1992-030-9 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Entropic regularization of continuous optimal transport problems

Abstract

1 Introduction

Notation and problem statement.

Related work.

Organization.

2 Review of functions of finite entropy and the space Llog⁡L\scriptstyle L\log LLlogL

Proposition 2.1** ([29, Thm. 1.2]).**

Definition 2.2** (Young functions).**

Definition 2.3** (Luxemburg norm and Orlicz spaces).**

Remark 2.4**.**

Theorem 2.5** ([1, Thm. 8.10]).**

Lemma 2.6**.**

Proof 2.7**.**

Proposition 2.8** ([5, Thm. IV.6.5]).**

Lemma 2.9**.**

Example 2.10**.**

Lemma 2.11**.**

Proof 2.12**.**

Lemma 2.13**.**

Proof 2.14**.**

Corollary 2.15**.**

Proof 2.16**.**

3 Fenchel duality in M\scriptstyle{\mathcal{M}}M and C\scriptstyle{\mathcal{C}}C

Proposition 3.1** (strong duality).**

Proof 3.2**.**

Remark 3.3**.**

Theorem 3.4**.**

Proof 3.5**.**

Proposition 3.6**.**

Proof 3.7**.**

4 Duality in Llog⁡L\scriptstyle L\log LLlogL and Lexp⁡\scriptstyle L_{\exp}Lexp​

Lemma 4.1**.**

Proof 4.2**.**

Lemma 4.3**.**

Proof 4.4**.**

Remark 4.5**.**

Lemma 4.6**.**

Proof 4.7**.**

Lemma 4.8**.**

Proof 4.9**.**

Theorem 4.10** (dual existence).**

Proof 4.11**.**

Proposition 4.12** (strong duality).**

Proof 4.13**.**

Theorem 4.14** (optimality conditions).**

Proof 4.15**.**

Remark 4.16**.**

Remark 4.17**.**

Corollary 4.18**.**

Proof 4.19**.**

Remark 4.20**.**

5 Γ\scriptstyle\GammaΓ-limit

Theorem 5.1**.**

Proof 5.2**.**

6 Conclusion

Acknowledgments

2 Review of functions of finite entropy and the space $\scriptstyle L\log L$

Proposition 2.1 ([29, Thm. 1.2]).

Definition 2.2 (Young functions).

Definition 2.3 (Luxemburg norm and Orlicz spaces).

Remark 2.4.

Theorem 2.5 ([1, Thm. 8.10]).

Lemma 2.6.

Proof 2.7.

Proposition 2.8 ([5, Thm. IV.6.5]).

Lemma 2.9.

Example 2.10.

Lemma 2.11.

Proof 2.12.

Lemma 2.13.

Proof 2.14.

Corollary 2.15.

Proof 2.16.

3 Fenchel duality in $\scriptstyle{\mathcal{M}}$ and $\scriptstyle{\mathcal{C}}$

Proposition 3.1 (strong duality).

Proof 3.2.

Remark 3.3.

Theorem 3.4.

Proof 3.5.

Proposition 3.6.

Proof 3.7.

4 Duality in $\scriptstyle L\log L$ and $\scriptstyle L_{\exp}$

Lemma 4.1.

Proof 4.2.

Lemma 4.3.

Proof 4.4.

Remark 4.5.

Lemma 4.6.

Proof 4.7.

Lemma 4.8.

Proof 4.9.

Theorem 4.10 (dual existence).

Proof 4.11.

Proposition 4.12 (strong duality).

Proof 4.13.

Theorem 4.14 (optimality conditions).

Proof 4.15.

Remark 4.16.

Remark 4.17.

Corollary 4.18.

Proof 4.19.

Remark 4.20.

5 $\scriptstyle\Gamma$ -limit

Theorem 5.1.

Proof 5.2.