Quadratically regularized optimal transport

Dirk A. Lorenz; Paul Manns; Christian Meyer

arXiv:1903.01112·math.OC·September 10, 2019

Quadratically regularized optimal transport

Dirk A. Lorenz, Paul Manns, Christian Meyer

PDF

TL;DR

This paper explores quadratic regularization in optimal transport, deriving dual formulations, proving strong duality, and developing algorithms that produce sparse transport plans, contrasting with entropic regularization.

Contribution

It introduces quadratic regularization for optimal transport, providing dual problem derivations, solution existence proofs, and two algorithms with numerical analysis.

Findings

01

Algorithms perform well even with small regularization parameters

02

Quadratic regularization yields sparse optimal transport plans

03

Dual problem formulation and strong duality established

Abstract

We investigate the problem of optimal transport in the so-called Kantorovich form, i.e. given two Radon measures on two compact sets, we seek an optimal transport plan which is another Radon measure on the product of the sets that has these two measures as marginals and minimizes a certain cost function. We consider quadratic regularization of the problem, which forces the optimal transport plan to be a square integrable function rather than a Radon measure. We derive the dual problem and show strong duality and existence of primal and dual solutions to the regularized problem. Then we derive two algorithms to solve the dual problem of the regularized problem: A Gauss-Seidel method and a semismooth quasi-Newton method and investigate both methods numerically. Our experiments show that the methods perform well even for small regularization parameters. Quadratic regularization is of…

Tables1

Table 1. Table 1: Mapping discretization quantities to solver quantities.

Coefficient	Solver Quantity	Conversion
$π_{i j}$	${\bar{π}}_{i j}$	${\bar{π}}_{i j} = γ π_{i j}$
$c_{i j}$	${\bar{c}}_{i j}$	${\bar{c}}_{i j} = c_{i j}$
$μ_{i}^{-}$	${\bar{μ}}_{i}^{-}$	${\bar{μ}}_{i}^{-} = N μ_{i}^{-}$
$μ_{j}^{+}$	${\bar{μ}}_{j}^{+}$	${\bar{μ}}_{j}^{+} = N μ_{j}^{+}$
$α_{i}$	${\bar{α}}_{i}$	${\bar{α}}_{i} = α_{i}$
$β_{j}$	${\bar{β}}_{j}$	${\bar{β}}_{j} = β_{j}$
$J$	$\bar{J}$	$\bar{J} = J N^{2} γ$
$τ$	$\bar{τ}$	$\bar{τ} = τ N$

Equations271

π min ⟨ c, π ⟩_{L^{2}} + \frac{γ}{2} ∥ π ∥_{L^{2}}^{2} subject to \int_{Ω_{2}} π (x_{1}, x_{2}) d λ_{2} \int_{Ω_{1}} π (x_{1}, x_{2}) d λ_{1} π (x_{1}, x_{2}) = μ_{1} (x_{1}), = μ_{2} (x_{2}), \geq 0

π min ⟨ c, π ⟩_{L^{2}} + \frac{γ}{2} ∥ π ∥_{L^{2}}^{2} subject to \int_{Ω_{2}} π (x_{1}, x_{2}) d λ_{2} \int_{Ω_{1}} π (x_{1}, x_{2}) d λ_{1} π (x_{1}, x_{2}) = μ_{1} (x_{1}), = μ_{2} (x_{2}), \geq 0

\int_{Ω_{1}} μ_{1}^{2} (x_{1}) d λ_{1} = \int_{Ω_{1}} (\int_{Ω_{2}} π^{*} (x_{1}, x_{2}) d λ_{2})^{2} d λ_{1} \leq ∣ Ω_{2} ∣ \iint_{Ω_{1} \times Ω_{2}} π^{*} (x_{1}, x_{2})^{2} d λ_{1} d λ_{2} < \infty

\int_{Ω_{1}} μ_{1}^{2} (x_{1}) d λ_{1} = \int_{Ω_{1}} (\int_{Ω_{2}} π^{*} (x_{1}, x_{2}) d λ_{2})^{2} d λ_{1} \leq ∣ Ω_{2} ∣ \iint_{Ω_{1} \times Ω_{2}} π^{*} (x_{1}, x_{2})^{2} d λ_{1} d λ_{2} < \infty

\int_{Ω_{1}} μ_{1} (x_{1}) d λ_{1}

\int_{Ω_{1}} μ_{1} (x_{1}) d λ_{1}

= \int_{Ω_{2}} μ_{2} (x_{2}) d λ_{2}

μ : = γ μ_{1} \otimes μ_{2} .

μ : = γ μ_{1} \otimes μ_{2} .

P_{1} : L^{2} (Ω) ∋ π \mapsto \int_{Ω_{2}} π d λ_{2} \in L^{2} (Ω_{1}), P_{2} : L^{2} (Ω) ∋ π \mapsto \int_{Ω_{1}} π d λ_{1} \in L^{2} (Ω_{2}),

P_{1} : L^{2} (Ω) ∋ π \mapsto \int_{Ω_{2}} π d λ_{2} \in L^{2} (Ω_{1}), P_{2} : L^{2} (Ω) ∋ π \mapsto \int_{Ω_{1}} π d λ_{1} \in L^{2} (Ω_{2}),

E_{γ} : L^{2} (Ω) \to R, E_{γ} (π) : = \int_{Ω} c π d λ + \frac{γ}{2} ∥ π ∥_{L^{2} (Ω)}^{2} .

E_{γ} : L^{2} (Ω) \to R, E_{γ} (π) : = \int_{Ω} c π d λ + \frac{γ}{2} ∥ π ∥_{L^{2} (Ω)}^{2} .

L : L^{2} (Ω) \times L^{2} (Ω_{1}) \times L^{2} (Ω_{2}) \times L^{2} (Ω) \to R,

L : L^{2} (Ω) \times L^{2} (Ω_{1}) \times L^{2} (Ω_{2}) \times L^{2} (Ω) \to R,

L (π, α_{1}, α_{2}, ϱ) : = E_{γ} (π) - ⟨ ϱ, π ⟩_{L^{2} (Ω)} + ⟨ α_{1}, P_{1} π - μ_{1} ⟩_{L^{2} (Ω_{1})} + ⟨ α_{2}, P_{2} π - μ_{2} ⟩_{L^{2} (Ω_{2})} .

π \in L^{2} (Ω) in f α_{1} \in L^{2} (Ω_{1}), α_{2} \in L^{2} (Ω_{2}) ϱ \in L^{2} (Ω), ϱ \geq 0 sup L (π, α_{1}, α_{2}, ϱ),

π \in L^{2} (Ω) in f α_{1} \in L^{2} (Ω_{1}), α_{2} \in L^{2} (Ω_{2}) ϱ \in L^{2} (Ω), ϱ \geq 0 sup L (π, α_{1}, α_{2}, ϱ),

α_{1} \in L^{2} (Ω_{1}), α_{2} \in L^{2} (Ω_{2}) ϱ \in L^{2} (Ω), ϱ \geq 0 sup π \in L^{2} (Ω) in f L (π, α_{1}, α_{2}, ϱ) .

α_{1} \in L^{2} (Ω_{1}), α_{2} \in L^{2} (Ω_{2}) ϱ \in L^{2} (Ω), ϱ \geq 0 sup π \in L^{2} (Ω) in f L (π, α_{1}, α_{2}, ϱ) .

π = \frac{1}{γ} (ρ + α_{1} \oplus α_{2} - c),

π = \frac{1}{γ} (ρ + α_{1} \oplus α_{2} - c),

(v_{1} \oplus v_{2}) (x_{1}, x_{2}) : = v_{1} (x_{1}) + v_{2} (x_{2})

(v_{1} \oplus v_{2}) (x_{1}, x_{2}) : = v_{1} (x_{1}) + v_{2} (x_{2})

\displaystyle\sup_{\alpha_{1}\in L^{2}(\Omega_{1}),\alpha_{2}\in L^{2}(\Omega_{2})}\,\sup_{\rho\geq 0}\Big{(}-\frac{1}{2\gamma}\int_{\Omega}(\rho+\alpha_{1}\oplus\alpha_{2}-c)^{2}\,\mathrm{d}\lambda

\displaystyle\sup_{\alpha_{1}\in L^{2}(\Omega_{1}),\alpha_{2}\in L^{2}(\Omega_{2})}\,\sup_{\rho\geq 0}\Big{(}-\frac{1}{2\gamma}\int_{\Omega}(\rho+\alpha_{1}\oplus\alpha_{2}-c)^{2}\,\mathrm{d}\lambda

+ \int_{Ω_{1}} μ_{1} α_{1} d λ_{1}

ρ = - (α_{1} \oplus α_{2} - c)_{-} .

ρ = - (α_{1} \oplus α_{2} - c)_{-} .

min s.t. Φ (α_{1}, α_{2}) : = \frac{1}{2} ∥ (α_{1} \oplus α_{2} - c)_{+} ∥_{L^{2} (Ω)}^{2} - γ ⟨ α_{1}, μ_{1} ⟩ - γ ⟨ α_{2}, μ_{2} ⟩ α_{i} \in L^{2} (Ω_{i}), i = 1, 2. ⎭ ⎬ ⎫

min s.t. Φ (α_{1}, α_{2}) : = \frac{1}{2} ∥ (α_{1} \oplus α_{2} - c)_{+} ∥_{L^{2} (Ω)}^{2} - γ ⟨ α_{1}, μ_{1} ⟩ - γ ⟨ α_{2}, μ_{2} ⟩ α_{i} \in L^{2} (Ω_{i}), i = 1, 2. ⎭ ⎬ ⎫

min s.t. Φ (α_{1}, α_{2}) α_{i} \in L^{1} (Ω_{i}), i = 1, 2, (α_{1} \oplus α_{2} - c)_{+} \in L^{2} (Ω) .}

min s.t. Φ (α_{1}, α_{2}) α_{i} \in L^{1} (Ω_{i}), i = 1, 2, (α_{1} \oplus α_{2} - c)_{+} \in L^{2} (Ω) .}

G : L^{2} (Ω) ∋ w \mapsto \int_{Ω} \frac{1}{2} w_{+}^{2} - w μ d λ \in R .

G : L^{2} (Ω) ∋ w \mapsto \int_{Ω} \frac{1}{2} w_{+}^{2} - w μ d λ \in R .

Φ (α_{1}, α_{2}) = G (α_{1} \oplus α_{2} - c) - \int_{Ω} c μ d λ \forall α_{i} \in L^{2} (Ω_{i}), i = 1, 2.

Φ (α_{1}, α_{2}) = G (α_{1} \oplus α_{2} - c) - \int_{Ω} c μ d λ \forall α_{i} \in L^{2} (Ω_{i}), i = 1, 2.

α_{1} \oplus α_{2} : = α_{1} \otimes λ_{2} + λ_{1} \otimes α_{2}, α_{i} \in M (Ω_{i}), i = 1, 2.

α_{1} \oplus α_{2} : = α_{1} \otimes λ_{2} + λ_{1} \otimes α_{2}, α_{i} \in M (Ω_{i}), i = 1, 2.

G (w^{n}) \leq C < \infty \forall n \in N .

G (w^{n}) \leq C < \infty \forall n \in N .

∥ w_{+}^{n} ∥_{L^{2} (Ω)}^{2} = G (w^{n}) + \int_{Ω} w_{+}^{n} μ d λ - \int_{Ω} w_{-}^{n} μ d λ \leq C + ∥ μ ∥_{L^{2} (Ω)} ∥ w_{+}^{n} ∥_{L^{2} (Ω)},

∥ w_{+}^{n} ∥_{L^{2} (Ω)}^{2} = G (w^{n}) + \int_{Ω} w_{+}^{n} μ d λ - \int_{Ω} w_{-}^{n} μ d λ \leq C + ∥ μ ∥_{L^{2} (Ω)} ∥ w_{+}^{n} ∥_{L^{2} (Ω)},

C \geq G (w^{n})

C \geq G (w^{n})

\geq - \int_{Ω} μ^{2} /2 d λ + δ ∥ w_{-}^{n} ∥_{L^{1} (Ω)},

G (w^{*}) \leq n \to \infty lim inf G (w^{n}) .

G (w^{*}) \leq n \to \infty lim inf G (w^{n}) .

\int_{Ω} (w_{+}^{*})^{2} d λ \leq \int_{Ω} (θ_{+})^{2} d λ \leq n \to \infty lim inf \int_{Ω} (w_{+}^{n})^{2} d λ,

\int_{Ω} (w_{+}^{*})^{2} d λ \leq \int_{Ω} (θ_{+})^{2} d λ \leq n \to \infty lim inf \int_{Ω} (w_{+}^{n})^{2} d λ,

∥ α_{1} ∥_{M} \leq \frac{1}{∣ Ω _{2} ∣} ∥ α_{1} \oplus α_{2} ∥_{M} and ∥ α_{2} ∥_{M} \leq \frac{2}{∣ Ω _{1} ∣} ∥ α_{1} \oplus α_{2} ∥_{M}

∥ α_{1} ∥_{M} \leq \frac{1}{∣ Ω _{2} ∣} ∥ α_{1} \oplus α_{2} ∥_{M} and ∥ α_{2} ∥_{M} \leq \frac{2}{∣ Ω _{1} ∣} ∥ α_{1} \oplus α_{2} ∥_{M}

∥ α_{1} \oplus α_{2} ∥_{M}

∥ α_{1} \oplus α_{2} ∥_{M}

\geq ∥ ϕ_{1} ∥_{\infty} \leq 1 ∥ ϕ_{2} ∥_{\infty} \leq 1 sup \iint_{Ω_{1} \times Ω_{2}} ϕ_{1} (x_{1}) ϕ_{2} (x_{2}) d (α_{1} (x_{1}) + α_{2} (x_{2}))

\displaystyle=\sup_{\begin{subarray}{c}\|\phi_{1}\|_{\infty}\leq 1\\ \|\phi_{2}\|_{\infty}\leq 1\end{subarray}}\Bigg{[}\iint_{\Omega_{1}\times\Omega_{2}}\phi_{1}(x_{1})\phi_{2}(x_{2})\,\mathrm{d}\alpha_{1}(x_{1})\mathrm{d}\lambda_{2}

\displaystyle\qquad\qquad+\iint_{\Omega_{1}\times\Omega_{2}}\phi_{1}(x_{1})\phi_{2}(x_{2})\,\mathrm{d}\lambda_{1}\mathrm{d}\alpha_{2}(x_{2})\Bigg{]}.

∥ α_{1} \oplus α_{2} ∥_{M}

∥ α_{1} \oplus α_{2} ∥_{M}

= ∣ Ω_{2} ∣∥ α_{1} ∥_{M} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

∎

11institutetext: Dirk A. Lorenz, 11email: [email protected] 22institutetext: TU Braunschweig, Institute of Analysis and Algebra, Germany33institutetext: Paul Manns, 33email: [email protected] 44institutetext: TU Braunschweig, Institute of Mathematical Optimization, Germany55institutetext: Christian Meyer, 55email: [email protected] 66institutetext: TU Dortmund, Fakultät für Mathematik, Germany

Quadratically regularized optimal transport

Dirk A. Lorenz

Paul Manns

Christian Meyer

Abstract

We investigate the problem of optimal transport in the so-called Kantorovich form, i.e. given two Radon measures on two compact sets, we seek an optimal transport plan which is another Radon measure on the product of the sets that has these two measures as marginals and minimizes a certain cost function.

We consider quadratic regularization of the problem, which forces the optimal transport plan to be a square integrable function rather than a Radon measure. We derive the dual problem and show strong duality and existence of primal and dual solutions to the regularized problem. Then we derive two algorithms to solve the dual problem of the regularized problem: A Gauss-Seidel method and a semismooth quasi-Newton method and investigate both methods numerically. Our experiments show that the methods perform well even for small regularization parameters. Quadratic regularization is of interest since the resulting optimal transport plans are sparse, i.e. they have a small support (which is not the case for the often used entropic regularization where the optimal transport plan always has full measure).

Keywords:

optimal transport, regularization, semismooth Newton method, Gauss-Seidel method, duality

MSC:

49Q20, 65D99, 90C25

1 Introduction

In this paper we will investigate a regularized version of the optimal transport problem. Optimal transport dates back to the work of Monge in 1781 but the problem formulation we use here is the one of Kantorovich kantorovich1942translocation . Let us fix some notation and formulate the problem: Let $\Omega_{1}\subset\mathbb{R}^{d_{1}}$ , $\Omega_{2}\subset\mathbb{R}^{d_{2}}$ be two compact domains, denote $\Omega=\Omega_{1}\times\Omega_{2}$ , and assume we are given two positive regular Radon measures $\mu_{1}$ and $\mu_{2}$ on $\Omega_{1}$ and $\Omega_{2}$ , respectively. Further we assume that a cost function $c:\Omega_{1}\times\Omega_{2}\to\mathbb{R}$ is given that models the cost of transporting a unit of mass from $x_{1}\in\Omega_{1}$ to $x_{2}\in\Omega_{2}$ . The optimal transport problem asks to find a transport plan $\pi$ , which is a Radon measure on $\Omega$ , such that it has minimal overall transport cost $\int_{\Omega}c(x_{1},x_{2})\,\mathrm{d}\pi(x_{1},x_{2})$ among all measures $\pi$ which have $\mu_{1}$ and $\mu_{2}$ as first and second marginals, respectively, i.e. for all Borel sets $A\in\Omega_{1}$ it holds that $\pi(A\times\Omega_{2})=\mu_{1}(A)$ and for all Borel sets $B\in\Omega_{2}$ it holds that $\pi(\Omega_{1}\times B)=\mu_{2}(B)$ . This problem has been studied extensively and we refer to the books rachev1998massI ; rachev1998massII ; villani2003topics ; villani2008optimal ; santambrogio2015optimal . One particular result is, that an optimal plan $\pi^{*}$ exists and that the support of optimal plans is contained in the so-called $c$ -superdifferential of a $c$ -concave function (ambrosio2013user, , Theorem 1.13). For many cost functions $c$ , this means that optimal transport plans are supported on small sets and that they are in fact singular with respect to the Lebesgue measure on $\Omega$ . This makes the numerical treatment of optimal transport problems difficult and one can employ regularization to obtain approximately optimal plans $\pi$ that are functions on $\Omega$ . The regularization method that has got the most attention recently is regularization with the negative entropy of $\pi$ and we refer to papadakis2014proximal ; cuturi2016smoothed ; carlier2017convergence . Entropic regularization has gotten popular in machine learning applications due to the fact that it allows for the very simple Sinkhorn algorithm (in the discrete case), see cuturi2013sinkhorn ; genevay2018learning and also peyre2019computational for a recent and thorough review of the computational aspects of optimal transport.

Regularizations different from entropic regularization has been much less studied. We are only aware of works in the discrete case, e.g. blondel2018smoothOT ; essid2018quadratically . In this work we will investigate the case where we regularize the problem in $L^{2}(\Omega)$ . The paper is organized as follows: In Section 2 we state the problem and analyze existence and duality. It will turn out that existence of solutions of the dual problem will be quite tricky to show, but we will show that dual solutions exist in respective $L^{2}$ spaces and that a straightforward optimality system characterizes primal-dual optimality. In Section 3 we derive two different algorithms for the discrete version of the quadratically regularized optimal transport problem, and in Section 4 we comment on a simple discretization scheme and report numerical examples.

Notation.

We will abbreviate $x_{+}=\max(x,0)$ (and will apply this also to functions and to measures where + will mean the positive part from the Hahn-Jordan decomposition). By $C(\Omega)$ we denote that space of continuous functions on $\Omega$ (and we will always work on compact sets) equipped with the supremum norm $\|\cdot\|_{\infty}$ and by $\mathfrak{M}(\Omega)$ we denote the space of Radon measures on a compact domain and we use the norm $\|\mu\|_{\mathfrak{M}}=\sup\{\int f\,\mathrm{d}\mu\mid f\in C(\Omega),\ |f|\leq 1\}$ . The Lebesgue measure will be $\lambda$ (and we also use $\lambda_{1}$ and $\lambda_{2}$ to specify the Lebesgue measure on sets $\Omega_{1}$ and $\Omega_{2}$ , respectively). For convenience, we use $|\Omega|$ for the Lebesgue measure of the set $\Omega$ . Furthermore, for a Radon measure $w\in\mathfrak{M}$ , we denote the absolutely and singular part arising from the Lebesgue decomposition with respect to the Lebesgue measure by $w_{ac}$ and $w_{s}$ , i.e. they satisfy $w_{ac}\ll\lambda$ and $w_{s}\perp\lambda$ . Duality pairings are denoted by $\langle\cdot,\cdot\rangle$ . If both arguments of the duality pairing are positive and the duality pairing does not necessarily exist, e.g. for $\psi\in\mathfrak{M}(\Omega)$ and $x\in L^{2}(\Omega)$ , we set $\langle\psi,x\rangle\coloneqq+\infty$ .

2 Quadratic regularization in the continuous case

For the quadratically regularized optimal transport problem we seek a transport plan $\pi\in L^{2}(\Omega_{1}\times\Omega_{2})$ which for a given cost function $c\in L^{2}(\Omega_{1}\times\Omega_{2})$ , a regularization parameter $\gamma>0$ , and given functions $\mu_{i}\in L^{2}(\Omega_{i})$ , $i=1,2$ solves

[TABLE]

where the constraints are understood pointwise almost everywhere.

2.1 Solutions of the primal problem

It is straight forward to show, that optimal transport plans exist:

Lemma 1

Problem (1) has an optimal solution if and only if $\mu_{1}\in L^{2}(\Omega_{1})$ , $\mu_{2}\in L^{2}(\Omega_{2})$ , $\mu_{1},\mu_{2}\geq 0$ almost everywhere, and $\int_{\Omega_{1}}\mu_{1}(x_{1})\,\mathrm{d}\lambda_{1}=\int_{\Omega_{2}}\mu_{2}(x_{2})\,\mathrm{d}\lambda_{2}$ .

Proof

Assume that there is an optimal solution $\pi^{*}\in L^{2}(\Omega_{1}\times\Omega_{2})$ . By Jensen’s inequality we get

[TABLE]

which shows $\mu_{1}\in L^{2}(\Omega_{1})$ . The argument for $\mu_{2}$ is similar. Non-negativity of $\mu_{1}$ and $\mu_{2}$ follows from non-negativity of $\pi^{*}$ . Finally, by Fubini’s theorem

[TABLE]

Conversely, if $\mu_{1}\in L^{2}(\Omega_{1})$ and $\mu_{2}\in L^{2}(\Omega_{2})$ and $\mu_{1},\mu_{2}\geq 0$ we set $C:=\int_{\Omega_{1}}\mu_{1}(x_{1})\,\mathrm{d}\lambda_{1}=\int_{\Omega_{2}}\mu_{2}(x_{2})\,\mathrm{d}\lambda_{2}$ . Then $\pi(x_{1},x_{2})=\tfrac{1}{C}\mu_{1}(x_{1})\mu_{2}(x_{2})$ is feasible for (1) and since the objective is continuous, coercive, and strongly convex a (unique) minimizer exists.∎

2.2 Dual problem and existence of dual solutions

In the following section, we apply the classical Lagrange duality to the linear-quadratic program (1). To this end, let us define the Lagrangian associated with (1). In order to shorten the notation, we set

[TABLE]

Furthermore, we define

[TABLE]

and denote the the primal objective by

[TABLE]

Then, the Lagrangian associated with (1) is given by

[TABLE]

Then, by standard arguments, the primal problem in (1) is equivalent to

[TABLE]

while its (Lagrangian) dual is given by

[TABLE]

The main part of the upcoming analysis is devoted to the existence of solutions to (DP). Once this is established, the necessary and sufficient optimality condition associated with (1) in form of the variational inequality will allow us to derive an optimality system that is also amenable for numerical computations.

To show existence for (DP), we first reformulate the dual problem. Since $\mathcal{L}$ is quadratic w.r.t. $\pi$ , the inner $\inf$ -problem is solved by

[TABLE]

where the mapping $\oplus:L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})\to L^{2}(\Omega)$ is defined via

[TABLE]

for almost all $(x_{1},x_{2})\in\Omega$ and all $v_{i}\in L^{2}(\Omega_{i})$ , $i=1,2$ .

Remark 1

The map $\oplus$ is related to the adjoints of the projections $P_{1}$ and $P_{2}$ from (2) by $\alpha_{1}\oplus\alpha_{2}=P_{1}^{*}\alpha_{1}+P_{2}^{*}\alpha_{2}$ .

Inserting (4) into (DP) yields

[TABLE]

Again, the inner optimization problem is quadratic w.r.t. $\rho$ so that its solution is given by

[TABLE]

Inserted in (6), this results in the following dual problem

[TABLE]

To prove existence of solutions for this problem, we need to require the following

Assumption 1

The domains $\Omega_{1}$ and $\Omega_{2}$ are compact. Moreover, the cost function $c$ is in $L^{2}(\Omega)$ and fulfills $c\geq\underline{c}>-\infty$ . Furthermore, the marginals $\mu_{1}$ and $\mu_{2}$ satisfy $\mu_{i}\in L^{2}(\Omega_{i})$ and $\mu_{i}\geq\delta>0$ , $i=1,2$ . In addition we assume that $\int_{\Omega_{1}}\mu_{2}\,\mathrm{d}\lambda_{1}=\int_{\Omega_{1}}\mu_{2}\,\mathrm{d}\lambda_{1}=1$ .

Remark 2

The last assumption on the normalization of the marginals is just to ease the subsequent analysis and can be relaxed by $\int_{\Omega_{1}}\mu_{2}\,\mathrm{d}\lambda_{1}=\int_{\Omega_{1}}\mu_{2}\,\mathrm{d}\lambda_{1}$ , which is needed anyway to ensure the existence of a solution to the primal problem, see Lemma 1.

Remark 3

Note that there is an obvious source of non-uniqueness for the dual problem (D): We can add a constant to $\alpha_{1}$ and subtract it from $\alpha_{2}$ and this does not change the dual objective, i.e for any constant $C$ it holds that $\Phi(\alpha_{1}+C,\alpha_{2}-C)=\Phi(\alpha_{1},\alpha_{2})$ . This non-uniqueness will not cause trouble in the proofs and when convenient, we remove it, e.g. by demanding that $\int_{\Omega_{2}}\alpha_{2}d\lambda_{2}=0$ .

Observe that the objective $\Phi$ in (D) is also well defined for functions in $\alpha_{i}\in L^{1}(\Omega_{i})$ with $(\alpha_{1}\oplus\alpha_{2}-c)_{+}\in L^{2}(\Omega)$ . This gives rise to the following auxiliary dual problem:

[TABLE]

Our strategy to prove existence of solutions to (D) is now as follows:

First, we show that (D’) admits a solution $(\alpha_{1}^{*},\alpha_{2}^{*})\in L^{1}(\Omega_{1})\times L^{1}(\Omega_{2})$ , see Proposition 1. 2. 2.

Then, we prove that $\alpha_{1}^{*}$ and $\alpha_{2}^{*}$ possess higher regularity, namely that they are functions in $L^{2}(\Omega_{i})$ , $i=1,2$ , cf. Theorem 2.1. 3. 3.

Thus, $(\alpha_{1}^{*},\alpha_{2}^{*})$ is feasible for (D) and, since the feasible set of (D’) contains the one of (D), while the objective of (D’) restricted to $L^{2}$ -functions coincides with the objective in (D), this finally gives that $(\alpha_{1}^{*},\alpha_{2}^{*})$ is indeed optimal for (D).

The reason to consider (D’) is essentially that the objective $\Phi$ is not coercive in $L^{2}(\Omega)$ , but only in $L^{1}(\Omega)$ (at least w.r.t. the negative part of $\alpha_{i}$ ). Therefore, we have to deal with weakly∗ converging sequences in the space of Radon measures within the proof of existence of solutions. For this purpose, we need to extend the objective to a suitable set. To that end, let us define

[TABLE]

Note that, thanks to $\int_{\Omega_{1}}\mu_{2}\,\mathrm{d}\lambda_{1}=\int_{\Omega_{1}}\mu_{2}\,\mathrm{d}\lambda_{1}=1$ , it holds

[TABLE]

Of course, $G$ is also well defined as a functional on the feasible set of (D’) and we will denote this functional by the same symbol to ease notation. In order to extend $G$ to the space of Radon measures, consider for a given measure $w\in\mathfrak{M}(\Omega)$ , the Hahn-Jordan decomposition $w=w_{+}-w_{-}$ and assume that $w_{+}\in L^{2}(\Omega)$ . Then, we set $G(w)=\int_{\Omega}\tfrac{1}{2}\,w_{+}^{2}\,\mathrm{d}\lambda-\int_{\Omega}\mu\,\mathrm{d}w$ . With a slight abuse of notation, we denote this mapping by $G$ , too. Furthermore, for $w_{+}\in L^{2}(\Omega)$ , $-\int_{\Omega}w_{+}\mu\,\mathrm{d}\lambda$ is finite for $\mu\in L^{2}(\Omega)$ as in Assumption 1. Regarding, the negative part, we define $\int_{\Omega}\mu\,\mathrm{d}w_{-}\coloneqq\infty$ , where this expression is not properly defined, as $w_{-}$ and $\mu$ are both positive. Combining this, we obtain that $-\int_{\Omega}\mu\,\mathrm{d}w\in\mathbb{R}\cup\{\infty\}$ .

Note in this context that, if the singular part of $w$ (w.r.t. the Lebesgue measure) vanishes, then also $w_{+}\in L^{1}(\Omega)$ and $w_{+}(x)=\max\{0,w(x)\}$ $\lambda$ -a.e. in $\Omega$ so that both functionals coincide on $L^{2}(\Omega)$ , which justifies this notation. Furthermore, we also generalize the map $\oplus$ to the measure space by setting

[TABLE]

Again, it is easily seen that, for $\alpha_{i}\in L^{2}(\Omega_{i})$ , $i=1,2$ , this definition boils down to the one in (5). Also Remark 1 applies in that we can express $\alpha_{1}\oplus\alpha_{2}$ in terms of the adjoints of $P_{1}$ and $P_{2}$ from (2) when defined appropriately.

The next lemma is rather obvious and covers the coercivity of $G$ in $L^{1}(\Omega)$ as indicated above.

Lemma 2

Let Assumption 1 hold and suppose that a sequence $\{w^{n}\}\subset L^{2}(\Omega)$ fulfills

[TABLE]

Then, the sequences $\{w^{n}_{+}\}$ and $\{w^{n}_{-}\}$ are bounded in $L^{2}(\Omega)$ and $L^{1}(\Omega)$ , respectively.

Proof

We rewrite $G$ as $G(w)=\int_{\Omega}\tfrac{1}{2}\,w_{+}^{2}-w_{+}\mu\,\mathrm{d}\lambda+\int_{\Omega}w_{-}\mu\,\mathrm{d}\lambda$ . The positivity of $\mu$ then implies

[TABLE]

which gives the first assertion. To see the second one, we use $\mu\geq\delta$ to estimate

[TABLE]

which finishes the proof.∎

The next lemma provides a lower semicontinuity result for $G$ w.r.t. weak∗ convergence in $\mathfrak{M}(\Omega)$ . Note that, here, we need the extension of $G$ as introduced above.

Lemma 3

Let Assumption 1 be fulfilled and a sequence $\{w_{n}\}\subset L^{2}(\Omega)$ be given such that $w^{n}\rightharpoonup^{*}w^{*}$ in $\mathfrak{M}(\Omega)$ and $G(w^{n})\leq C<\infty$ for all $n\in\mathbb{N}$ . Then there holds $w^{*}_{+}\in L^{2}(\Omega)$ and

[TABLE]

Proof

By virtue of Lemma 2, $\{w^{n}_{+}\}$ is bounded in $L^{2}(\Omega)$ and thus, there is a subsequence of $\{w^{n}_{+}\}$ , to ease notation denoted by the same symbol, that converges weakly in $L^{2}(\Omega)$ to some $\theta_{+}\in L^{2}(\Omega)$ . Since the set $\{v\in L^{2}(\Omega):v\geq 0\text{ a.e.\ in }\Omega\}$ is clearly weakly closed, we have $\theta_{+}\geq 0$ a.e. in $\Omega$ . With a little abuse of notation, we denote the Radon measure induced by $C(\Omega)\ni\varphi\mapsto\int_{\Omega}\theta_{+}\,\varphi\,\mathrm{d}\lambda\in\mathbb{R}$ by $\theta_{+}$ , too. If we define $\theta_{-}:=\theta_{+}-w^{*}\in\mathfrak{M}(\Omega)$ , then $w^{n}_{-}=w^{n}_{+}-w^{n}\rightharpoonup^{*}\theta_{-}$ in $\mathfrak{M}(\Omega)$ with $\theta_{-}\geq 0$ . Thus we have $w^{*}=\theta_{+}-\theta_{-}$ with two positive Radon measures $\theta_{+}$ , $\theta_{-}$ . The maximality property of the Hahn-Jordan decomposition then implies $w^{*}_{+}\leq\theta_{+}$ . Since $\theta_{+}$ is absolutely continuous w.r.t. $\lambda$ , the same thus holds for $w^{*}_{+}$ , i.e. $w^{*}_{+}\in L^{1}(\Omega)$ . Applying again $w^{*}_{+}\leq\theta_{+}$ , which clearly also holds for the densities pointwise $\lambda$ -almost everywhere, we moreover deduce from the weak convergence of $w^{n}_{+}$ in $L^{2}(\Omega)$ that

[TABLE]

which implies $w^{*}_{+}\in L^{2}(\Omega)$ as claimed. Since the above reasoning applies to every subsequence $w^{n}_{+}$ that is weakly converging in $L^{2}(\Omega)$ , (11) holds for the whole sequence $\{w^{n}_{+}\}$ , which together with the weak∗ convergence of $w^{n}$ and the definition of $G$ , gives (10).

∎

Before we are in the position to prove existence for (D’), we need two additional results on the $\oplus$ -operator in the space of Radon measures.

Lemma 4

If $\alpha_{i}\in\mathfrak{M}(\Omega_{i})$ , $i=1,2$ and $\int_{\Omega_{2}}d\alpha_{2}=0$ , then it holds that

[TABLE]

Proof

We estimate

[TABLE]

Taking $\phi_{2}\equiv 1$ and using $\int_{\Omega_{2}}\,\mathrm{d}\alpha_{2}(x_{2})=0$ gives

[TABLE]

Now we start again at (12) and estimate from below by taking $\phi_{1}\equiv 1$ to get

[TABLE]

which implies

[TABLE]

which completes the proof.∎

The next lemma will be used to show that the negative part of the minimizer of (D) does not have a singular part.

Lemma 5

Let $c\in L^{1}(\Omega)$ and $\alpha_{i}\in\mathfrak{M}(\Omega_{i})$ for $i\in\{1,2\}$ with Lebesgue decompositions, $\alpha_{i}=f_{i}+\eta_{i}$ satisfying $f_{i}\ll\lambda$ and $\eta_{i}\perp\lambda$ for $i\in\{1,2\}$ .

It holds that

[TABLE] 2. 2.

If $(\alpha_{i})_{+}$ is absolutely continuous for $i=1,2$ , then for $\tilde{\alpha}_{i}=\alpha_{i}-(\eta_{i})_{-}$ for $i=1,2$ , it holds that

[TABLE]

Proof

We first proof point 1. The measures $f_{i}$ , $\eta_{i}$ exist by Lebesgue’s decomposition theorem, see Theorem 1.155 in fonseca2007calculusofvariations . We combine these decompositions with $\alpha_{1}\oplus\alpha_{2}=\alpha_{1}\otimes\lambda+\lambda\otimes\alpha_{2}$ to arrive at Lebesgue’s decomposition of $\alpha_{1}\oplus\alpha_{2}$ with respect to $\lambda\otimes\lambda$ , namely

[TABLE]

(which holds true because $c\in L^{1}(\Omega)\hookrightarrow\mathfrak{M}(\Omega)$ ). Now, we consider the Hahn-Jordan decomposition of $\eta_{1}$ ,

[TABLE]

and obtain from (14) that

[TABLE]

Furthermore,

[TABLE]

where the singularity with respect to $f_{1}\oplus f_{2}-c$ is due to (15) and (16) and the singularity with respect to $(\eta_{1})_{+}\oplus\eta_{2}$ is due to (17). Thus,

[TABLE]

as $(\eta_{1})_{-}\otimes\lambda$ is a positive measure. Consequently,

[TABLE]

Repeating this argument with the Hahn-Jordan decomposition of $\eta_{2}$ yields the claim.

The second part of the lemma is a direct consequence of the first: Since $(\alpha_{1}\oplus\alpha_{2}-c)_{+}=(\tilde{\alpha}_{1}\oplus\tilde{\alpha}_{2}-c)_{+}$ , the first summand in the functional $\Phi$ is equal for $\alpha_{i}$ and $\tilde{\alpha}_{i}$ . However, the second summand in $\Phi$ can not decrease since $\tilde{\alpha}_{i}\leq\alpha_{i}$ , $\mu_{i}\geq 0$ and $\gamma\langle(\eta_{i})_{-},\mu_{i}\rangle=\infty$ if the duality pairing does not exist. ∎

Now we are ready to prove the existence result for (D’):

Proposition 1

Under Assumption 1 the minimization problem (D’) admits a solution $(\alpha_{1}^{*},\alpha_{2}^{*})\in L^{1}(\Omega_{1})\times L^{1}(\Omega_{2})$ .

Proof

We proceed via the classical direct method of the calculus of variations. For this purpose, let $\{(\alpha_{1}^{n},\alpha_{2}^{n})\}\subset L^{1}(\Omega_{1})\times L^{1}(\Omega_{2})$ with $(\alpha_{1}^{n}\oplus\alpha_{2}^{n}-c)_{+}\in L^{2}(\Omega)$ be a minimizing sequence for (D’), where we shift $\alpha_{1}$ and $\alpha_{2}$ by adding and subtracting constants such that we obtain $\int_{\Omega_{2}}\alpha_{2}\,\mathrm{d}\lambda_{2}=0$ . Note that, due to its additive structure, this does not change the objective $\Phi$ in (D’), cf. Remark 3.

Next, let us define $w^{n}\coloneqq\alpha_{1}^{n}\oplus\alpha_{2}^{n}-c$ . Then, thanks to (9) and Lemma 2, the sequence $\{w^{n}\}$ is bounded in $L^{1}(\Omega)$ . Hence, there is a weakly∗ converging subsequence, which we denote by the same symbol w.l.o.g., i.e. $w^{n}\rightharpoonup^{*}\tilde{w}$ in $\mathfrak{M}(\Omega)$ . Now, Lemma 3 applies giving that

[TABLE]

Since $\{w^{n}\}$ is bounded in $\mathfrak{M}(\Omega)$ , the same holds for $\{\alpha_{1}^{n}\oplus\alpha_{2}^{n}\}$ and, as $\alpha_{2}^{n}$ is normalized, Lemma 4 gives that $\{\alpha_{i}^{n}\}$ is bounded in $\mathfrak{M}(\Omega_{i})$ , $i=1,2$ . Therefore, we can select a further (sub-)subsequence, still denoted by the same symbol to ease notation, such that

[TABLE]

Since the mapping $\mathfrak{M}(\Omega_{1})\times\mathfrak{M}(\Omega_{2})\ni(\alpha_{1},\alpha_{2})\mapsto\alpha_{1}\oplus\alpha_{2}\in\mathfrak{M}(\Omega)$ is the adjoint of the projection mapping $C(\Omega)\ni\varphi\mapsto\big{(}\int_{\Omega_{2}}\varphi\,\mathrm{d}\lambda_{2},\int_{\Omega_{1}}\varphi\,\mathrm{d}\lambda_{1}\big{)}\in C(\Omega_{1})\times C(\Omega_{2})$ , see Remark 1, it is weakly∗ continuous so that

[TABLE]

Next, we investigate the singular parts of $\tilde{\alpha}_{1}$ and $\tilde{\alpha}_{2}$ . We start with the positive part and employ Lebesgue’s decomposition of $\tilde{\alpha}_{1}$ and $\tilde{\alpha}_{2}$ :

[TABLE]

In the following we will see that the regular parts $\alpha_{i}^{*}\in L^{1}(\Omega_{i})$ , $i=1,2$ , are exactly the solution of (D’). For this purpose, we first show that the positive parts of $\tilde{\eta}_{1}$ and $\tilde{\eta}_{2}$ vanish. We have $\alpha_{1}^{*}\oplus\alpha_{2}^{*}-c\ll\lambda$ , $\tilde{\eta}_{1}\oplus\tilde{\eta}_{2}\perp\lambda$ , and, by uniqueness of Lebesgue’s decomposition, $\tilde{w}_{s}=\tilde{\eta}_{1}\oplus\tilde{\eta}_{2}$ . But from (18), we know that $(\tilde{w}_{s})_{+}=0$ . Combining this fact with Lemma 5, applied to the case $f_{1}=0$ , $f_{2}=0$ , and $c=0$ , we obtain

[TABLE]

and consequently, $(\tilde{\eta}_{i})_{+}=0$ for $i=1,2$ by positivity. Therefore, $(\tilde{\alpha}_{i})_{+}$ are $L^{1}$ -functions rather than measures and

[TABLE]

Now Lemma 5 shows feasibility of $(\alpha_{1}^{*},\alpha_{2}^{*})$ for (D’) and we also see that

[TABLE]

which demonstrates the optimality of $(\alpha_{1}^{*},\alpha_{2}^{*})$ . ∎

In the following, we assume that $\int_{\Omega_{2}}\alpha_{2}^{*}\,\mathrm{d}\lambda_{2}=0$ . If this is not the case, then we can again shift $\alpha_{1}^{*}$ and $\alpha_{2}^{*}$ without changing the value of $\Phi$ , cf. Remark 3.

Theorem 2.1

Let Assumption 1 hold. Then every optimal dual solution $(\alpha_{1}^{*},\alpha_{2}^{*})$ from Proposition 1 satisfies $\alpha_{i}^{*}\in L^{2}(\Omega_{i})$ , $i=1,2$ , and is therefore also a solution of the original dual problem (D). Moreover, the negative parts of $\alpha_{i}^{*}$ are bounded and the function $\frac{1}{\gamma}\left(\left(\alpha_{1}^{*}\oplus\alpha_{2}^{*}\right)-c\right)_{+}$ has marginals the $\mu_{1}$ and $\mu_{2}$ .

Proof

We again consider the positive and the negative part separately and start with $(\alpha_{1}^{*})_{-}$ . Let $\varphi\in C_{c}^{\infty}(\Omega_{1})$ and $t>0$ be fixed, but arbitrary. Then, thanks to

[TABLE]

Proposition 1 implies that $((\alpha_{1}^{*}+t\varphi)\oplus\alpha_{2}^{*}-c)_{+}\in L^{2}(\Omega)$ so that $(\alpha_{1}^{*}+t\varphi,\alpha_{2}^{*})$ is feasible for (D’). Therefore, the optimality of $(\alpha_{1}^{*},\alpha_{2}^{*})$ for (D’) yields

[TABLE]

Owing to the continuous differentiability of $\mathbb{R}\ni r\mapsto r_{+}^{2}\in\mathbb{R}$ , the first integrand converges to $2(\alpha_{1}^{*}\oplus\alpha_{2}^{*}-c)_{+}\varphi$ $\lambda$ -a.e. in $\Omega$ for $t\searrow 0$ . Moreover, the Lipschitz continuity of the $\max$ -function gives that

[TABLE]

holds for $0<t\leq 1$ . Hence, due to Lebesgue’s dominated convergence theorem, we are allowed to pass to the limit $t\searrow 0$ and obtain in this way

[TABLE]

Since $\varphi\in C_{c}^{\infty}(\Omega)$ was arbitrary, the fundamental lemma of the calculus of variations thus gives

[TABLE]

Next, define the following sequence of functions in $L^{1}(\Omega_{2})$ :

[TABLE]

where $\underline{c}$ is the lower bound for $c$ from Assumption 1. Then we have $f_{n}\geq 0$ $\lambda_{2}$ -a.e. $\Omega_{2}$ and $f_{n}\searrow 0$ $\lambda_{2}$ -a.e. in $\Omega_{2}$ so that the monotone convergence theorem gives

[TABLE]

Thus there exists $N\in\mathbb{N}$ such that

[TABLE]

where $\delta>0$ is the threshold for $\mu_{1}$ from Assumption 1. Now assume that $\alpha_{1}^{*}\leq-N$ $\lambda_{1}$ -a.e. on a set of $E\subset\Omega_{1}$ of positive Lebesgue measure. Then

[TABLE]

which contradicts (23). Therefore, $\alpha_{1}^{*}>-N$ $\lambda_{1}$ -a.e. in $\Omega_{1}$ , which even implies that $(\alpha_{1}^{*})_{-}\in L^{\infty}(\Omega_{1})$ . Concerning $(\alpha_{2}^{*})_{-}$ , one can argue in exactly the same way to conclude that $(\alpha_{2}^{*})_{-}\in L^{\infty}(\Omega_{2})$ , too.

For the positive parts we find

[TABLE]

where we used (21) and the boundedness of the negative parts proven above. Note that the constant shift, potentially needed to ensure $\int_{\Omega_{2}}\alpha_{2}^{*}\,\mathrm{d}\lambda_{2}=0$ has no effect on the equation in (21) due to the additive structure of $\oplus$ .

We have thus shown that $(\alpha_{1}^{*},\alpha_{2}^{*})$ is feasible for (D). Since $(\alpha_{1}^{*},\alpha_{2}^{*})$ solves (D’), whose objective is the same as in (D), while its feasible set is larger, this implies that we have found a solution to (D). ∎

We now show that, if $\pi^{*}$ is of the form $\pi^{*}=\gamma^{-1}(\alpha_{1}^{*}\oplus\alpha_{2}^{*}-c)_{+}$ with two functions $\alpha_{i}^{*}\in L^{2}(\Omega_{i})$ , $i=1,2$ , and has the marginals $\mu_{1}$ and $\mu_{2}$ , respectively, then it solves the necessary and sufficient optimality conditions of the primal problem (1) in form of the following variational inequality:

[TABLE]

Herein, $\mathcal{F}$ is the (convex) feasible set of (1), i.e.

[TABLE]

For this purpose, let $\pi\in\mathcal{F}$ be fixed but arbitrary. Multiplying the equality constraints in $\mathcal{F}$ with $\alpha_{1}^{*}$ and $\alpha_{2}^{*}$ , respectively, integrating the arising equations and add them yields

[TABLE]

where we used $\pi\geq 0$ for the last inequality. Using the feasibility of $\pi^{*}$ , we find similarly

[TABLE]

Combining (25) and (26) now yields (VI). As (1) is a strictly convex minimization problem, this shows that, if $\pi^{*}$ has the form $\pi^{*}=\gamma^{-1}(\alpha_{1}^{*}\oplus\alpha_{2}^{*}-c)_{+}$ with functions $\alpha_{i}^{*}\in L^{2}(\Omega_{i})$ and satisfies $\pi^{*}\in\mathcal{F}$ , then it is a solution of (1). On the other hand, we know from Theorem 2.1 that, under Assumption 1 (more or less needed for the existence of solutions of (1) anyway), there always exist $\alpha_{i}^{*}\in L^{2}(\Omega_{i})$ so that $\pi^{*}=\gamma^{-1}(\alpha_{1}^{*}\oplus\alpha_{2}^{*}-c)_{+}$ satisfies the equality constraints in $\mathcal{F}$ . Therefore, in summary we have deduced the following:

Theorem 2.2 (Necessary and Sufficient Optimality Conditions for (1))

Under Assumption 1, $\pi^{*}\in L^{2}(\Omega)$ is a solution of (1) if and only if there exist functions $\alpha_{i}^{*}\in L^{2}(\Omega_{i})$ , $i=1,2$ , such that the following optimality system is fulfilled:

[TABLE]

The significance of Theorem 2.2 lies in the fact that we can characterize optimality of $\pi$ by just two equalities in $L^{2}(\Omega_{1})$ and $L^{2}(\Omega_{2})$ , respectively, namely (27b) and (27c). Thus, we effectively reduce the size of the problem from searching one function on $\Omega=\Omega_{1}\times\Omega_{2}$ to searching two functions, one on $\Omega_{1}$ and one on $\Omega_{2}$ (similarly as for entropic regularization, cf. carlier2017convergence ). This will be exploited numerically in Section 3.

2.3 Regularization of the dual problem

As seen before, the dual problem in (D) is not uniquely solvable. One source of non-uniqueness is of course the kernel of the map $(\alpha_{1},\alpha_{2})\mapsto\alpha_{1}\oplus\alpha_{2}$ . This kernel is one-dimensional and is spanned by the function $(1,-1)$ , which could be easily taken into account in an algorithmic framework. However, there is another source of non-uniqueness due to the max-operator that cuts of the negative part. Here is a simple example where dual solutions are not unique: For $\Omega_{1}=\Omega_{2}=[0,1]$ , $\mu_{1}=\mu_{2}\equiv 1$ , and

[TABLE]

one can show by a straight forward calculation that, for every $\delta\in[0,\frac{C-4}{2}]$ , the tuple

[TABLE]

solves the optimality system (27b)–(27c). This shows that the potential structure of non-uniqueness might become fairly intricate. A situation like this can certainly happen in the discretized problem we will derive in Section 2.4 and can lead to problems when we derive algorithms for the discrete problem since non-unique solutions imply a degenerate Hessian at the optimum.

Therefore, we investigate the following regularization of the dual problem:

[TABLE]

with a regularization parameter $\varepsilon>0$ . It is clear that the additional quadratic terms in the regularized objective $\Phi_{\varepsilon}$ yield that the latter is strictly convex and coercive in $L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})$ . Therefore, for every $\varepsilon>0$ , (Dε) admits a unique solution.

Proposition 2

Let $\{\varepsilon_{n}\}\subset\mathbb{R}^{+}$ be a sequence converging to zero and denote the solutions of (Dε) with $\varepsilon=\varepsilon_{n}$ by $(\alpha_{1}^{n},\alpha_{2}^{n})\in L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})$ . Then the sequence $\{(\alpha_{1}^{n},\alpha_{2}^{n})\}$ admits a weak accumulation point, every weak accumulation point is also strong one and a solution of the original dual problem (D).

Proof

Let $(\alpha_{1}^{*},\alpha_{2}^{*})\in L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})$ denote an arbitrary globally optimal solution of (D) (whose existence is guaranteed by Theorem 2.1). Then the optimality of $(\alpha_{1}^{*},\alpha_{2}^{*})$ for (D) and of $(\alpha_{1}^{n},\alpha_{2}^{n})$ for (Dε) (with $\varepsilon=\varepsilon_{n}$ ) gives

[TABLE]

which implies

[TABLE]

Thus, the boundedness of $\{(\alpha_{1}^{n},\alpha_{2}^{n})\}$ in $L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})$ . This in turn gives the existence of a weak accumulation point as claimed.

Now assume that $(\tilde{\alpha}_{1},\tilde{\alpha}_{2})$ is such a weak accumulation point, i.e.

[TABLE]

(for a subsequence). Using again the optimality of $(\alpha_{1}^{*},\alpha_{2}^{*})$ and $(\alpha_{1}^{n},\alpha_{2}^{n})$ , respectively, we obtain

[TABLE]

On the other hand, by convexity and weak lower semicontinuity of $\Phi$ we get from (29) and (30) that

[TABLE]

which gives in turn the optimality of the weak limit. Estimate (28) for the choice $(\alpha_{1}^{*},\alpha_{2}^{*})=(\tilde{\alpha}_{1},\tilde{\alpha}_{2})$ shows that

[TABLE]

and thus, we have

[TABLE]

but $(\alpha_{1}^{n},\alpha_{2}^{n})\nrightarrow(\tilde{\alpha}_{1},\tilde{\alpha}_{2})$ would imply

[TABLE]

and consequently, we have $(\alpha_{1}^{n},\alpha_{2}^{n})\to(\tilde{\alpha}_{1},\tilde{\alpha}_{2})$ in $L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})$ . ∎

Theorem 2.3

Let $\{\varepsilon_{n}\}\subset\mathbb{R}^{+}$ be a sequence converging to zero and denote the solutions of (Dε) with $\varepsilon=\varepsilon_{n}$ again by $(\alpha_{1}^{n},\alpha_{2}^{n})\in L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})$ . Moreover, define

[TABLE]

Then $\pi_{n}$ converges strongly in $L^{2}(\Omega)$ to the unique solution of (1).

Proof

From (28), we know that $\{(\alpha_{1}^{n},\alpha_{2}^{n})\}$ is bounded and hence, $\{\pi_{n}\}$ is bounded in $L^{2}(\Omega)$ . Thus,

[TABLE]

for some subsequence. Now we show that $\tilde{\pi}$ is the optimal for (1). Weak closedness of $\{\pi\in L^{2}(\Omega):\pi(x_{1},x_{2})\geq 0\text{ a.e.~{}in }\Omega\}$ implies $\tilde{\pi}\geq 0$ . Integrating the first-order optimality conditions for (Dε)

[TABLE]

against some $\varphi_{1}\in C^{\infty}_{c}(\Omega_{1})$ , inserting the definition of $\pi_{n}$ , and integrating over $\Omega_{1}$ yields

[TABLE]

Passing to the limit we obtain

[TABLE]

and thus, $\tilde{\pi}$ satisfies the first equality constraint in (1). The second equality constraint can be verified analogously.

To show optimality of $\tilde{\pi}$ , we test the optimality conditions in (33) and (34) with $\alpha_{1}^{n}$ and $\alpha_{2}^{n}$ , respectively, and get

[TABLE]

where $E_{\gamma}$ is the primal objective from (3). Similarly, we get

[TABLE]

where $\pi^{*}\in L^{2}(\Omega)$ is the unique solution of (1) and $(\alpha_{1}^{*},\alpha_{2}^{*})\in L^{2}(\Omega_{1})\times L^{2}(\Omega_{2})$ solves the dual problem (D). Now, putting everything so far together, we obtain

[TABLE]

On the other hand, $E_{\gamma}$ is weakly lower semicontinuous, and therefore

[TABLE]

This gives the optimality of $\tilde{\pi}$ and by strict convexity also uniqueness, i.e. $\tilde{\pi}=\pi^{*}$ . Thus, the weak limit is unique and a well known argument by contradiction therefore implies the weak convergence of the whole sequence $\{\pi_{n}\}$ to $\pi^{*}$ . Finally, strong convergence follows from a standard argument. ∎

2.4 The discrete dual problem

We show a simple discretization of the quadratically regularized optimal transport problem (1) by piecewise constant approximation in Appendix A. To keep the notation concise, we state the corresponding discrete optimal transport problem and illustrate the duality already here. This will be the basis of our algorithms we derive in Section 3. A discrete version of the continuous problem (1) is the finite-dimensional problem

[TABLE]

where $\mathbf{1}_{N}\in\mathbb{R}^{N}$ denotes the vector of all ones, $\mu\in\mathbb{R}^{N}$ , $\nu\in\mathbb{R}^{M}$ denote the discretized marginals with $\sum_{j=1}^{N}\mu_{j}=\sum_{i=1}^{M}\nu_{j}$ , and $c\in\mathbb{R}^{M\times N}$ denotes the discretized cost. Note that we slightly changed the notation from $\mu_{1}$ and $\mu_{2}$ to $\mu$ and $\nu$ , respectively. For the discrete form of the optimality system (27) we further replace the Lagrange multipliers $\alpha_{1}$ and $\alpha_{2}$ by $\alpha$ and $\beta$ , respectively, and get

[TABLE]

where $\alpha\in\mathbb{R}^{M}$ , $\beta\in\mathbb{R}^{N}$ and $(\alpha\oplus\beta)_{i,j}=\alpha_{i}+\beta_{j}$ is the “outer sum”. The discrete counterpart of $\Phi$ from (D) is

[TABLE]

where $\|\cdot\|_{F}$ denotes the Frobenius norm.

We write the optimality condition (36b)-(36c) as a non-smooth equation $F(\alpha,\beta)=0$ in $\mathbb{R}^{M+N}$ with

[TABLE]

(note that $F_{1}=\partial_{\alpha}\Phi$ and $F_{2}=\partial_{\beta}\Phi$ ). Since $F$ is the composition of Lipschitz continuous and semismooth functions, we have the following result (for the chain rule for semismooth functions, see e.g. (OPTpdecon, , Thm. 2.10)):

Lemma 6

The function $F$ (and thus, the gradient of $\Phi$ ) is (globally) Lipschitz continuous and semismooth.

3 Algorithms

The optimality system (36b), (36c) for the smooth and convex problem (D) can be solved by different methods. In blondel2018smoothOT the authors propose to use a generic L-BFGS solver and also derive an alternating minimization scheme, which is similar to the non-linear Gauss-Seidel method in the next section, but differs slightly in the numerical realization and roberts2017gini also uses an off-the-shelf solver. Here we propose methods that exploit the special structure of the optimality system: A non-linear Gauss-Seidel method and a semismooth Newton method.

3.1 Non-linear Gauss-Seidel

The method in this section is similar to the one described in the Appendix of blondel2018smoothOT , but we describe it here for the sake of completeness. A close look at the optimality system

[TABLE]

shows that we can solve all $M$ equations in (38a) for the $\alpha_{i}$ in parallel (for fixed $\beta$ ) since the $i$ th equation depends on $\alpha_{i}$ only. Similarly, all $N$ equations in (38b) can be solved for the $\beta_{j}$ if $\alpha$ is fixed. Hence, we can perform a non-linear Gauss-Seidel method for these non-smooth equations (also known as alternating minimization, nonlinear SOR or coordinate descent method for $\Phi$ chen2002convergence ; wright2015coordinatedescent ), i.e. alternatingly solving the equations (38a) for $\alpha$ (for fixed $\beta$ ) and then the equations (38b) for $\beta$ (for fixed $\alpha$ ). The whole method is stated in Algorithm 1. Since $\Phi$ is convex with Lipschitz continuous gradient (cf. Lemma 6) the convergence of the algorithm follows from results in bertsekas2016nlp .

Each equation for an $\alpha_{i}$ or $\beta_{j}$ is just a single scalar equation for a scalar quantity and the structure of the equation is of the following form: For a given vector $y\in\mathbb{R}^{n}$ and right hand side $b\in\mathbb{R}$ , solve

[TABLE]

Of course, one can solve this problem by bisection, but here are two other, more efficient methods to solve equations of the type (39):

Direct search.

If we denote by $y_{[j]}$ the $j$ -th smallest entry of $y$ (i.e. we sort $y$ in an ascending way), we get that

[TABLE]

To obtain the solution of (39) we evaluate $f$ at the break points $y_{[j]}$ until we find the interval $[y_{[k]},y_{[k+1]}[$ in which the solution lies (by finding $k$ such that $f(y_{[k]})\leq b<f(y_{[k+1]})$ ), and then setting

[TABLE]

The complexity of the method is dominated by the sorting of the vector $y$ , its complexity is $\mathcal{O}(n\log(n))$ .

Semismooth Newton.

Although $f$ is non-smooth, we may perform Newton’s method here. The function $f$ is piecewise linear and on each interval $]y_{[j]},y_{[j+1]}[$ is has the slope $j$ (a simple situation with $n=3$ is shown in Figure 1). At the break points we may define $f^{\prime}(y_{[j]})=j$ and then we iterate

[TABLE]

If we start with $x^{0}\geq y^{[n]}=\max_{k}y_{k}$ , the method will produce a monotonically decreasing sequence which converges in at most $n$ steps. Actually, we can initialize the method with any $x^{0}$ that is strictly larger than $y_{[1]}=\min_{k}y_{k}$ .

Note that we do not need to sort the values of $y_{k}$ to calculate the derivative since we have $f^{\prime}(x)=\#\{i\ :\ x\geq y_{i}\}$ . In practice, the method usually needs much less iterations than $n$ .

3.2 Semismooth Newton

As seen in Lemma 6, the mapping $F$ is semismooth and hence, we may use a semismooth Newton method chen1997 ; chen2000semismooth .

A simple calculation proves the following lemma.

Lemma 7

A Newton derivative of $F$ from (37) at $(\alpha,\beta)$ is given by

[TABLE]

where $\sigma\in\mathbb{R}^{M\times N}$ is given by

[TABLE]

A step of the semismooth Newton method for the solution of $F(\alpha,\beta)=0$ would consist of setting

[TABLE]

However, the next lemma shows, that $G$ has a non-trivial kernel.

Lemma 8

Let $G$ be the Newton derivative of $F$ at $(\alpha,\beta)$ defined in Lemma 7. Then the following holds true:

$G\in\mathbb{R}^{(M+N)\times(M+N)}$ * is symmetric,* 2. 2.

$G$ * is positive semi-definite,* 3. 3.

$(a,b)\in\operatorname{kern}(G)$ * if and only if $\sigma_{ij}(a_{i}+b_{j})=0$ for all $1\leq i\leq M$ , $1\leq j\leq N$ .*

Proof

Symmetry of $G$ is clear by construction. To see that $G$ is positive semi-definite we calculate

[TABLE]

Due to the non-negativity of $\sigma$ , this also shows the last point.∎

The third point of the lemma shows that the kernel of $G$ may have a high dimension, depending on the matrix $\sigma$ . Hence we resort to a quasi Newton method where we regularize the Newton step arising from the dual problem from Section 2.2 by setting

[TABLE]

with a small $\varepsilon>0$ . By chen1997 , the method still converges, but only a local linear rate is guaranteed. We note that we have not applied the semismooth Newton method to the regularized dual problem from Section 2.3. This would also be possible, but lead not only to the regularized Newton matrix from above but we would also have to adapt the objective $F$ in the computation of the update.

Let us make a few remarks on the the regularized Newton step and its numerical treatment.

•

The matrix $\sigma$ (and hence the Newton matrix $G$ ) is usually very sparse. The closer $\alpha$ and $\beta$ are to the optimal ones, the closer $(\alpha_{i}+\beta_{j}-c_{ij})_{+}$ is to the optimal regularized transport plan $\pi$ and for small $\gamma$ this usually very sparse.

•

Since $G$ is positive semi-definite, the regularized step could be done by the method of conjugate gradients. However, any linear solver that can exploit the sparsity of $G$ can be used.

As usual, the regularized semismooth Newton method may not converge globally. A simple globalization technique is an Armijo linesearch in the Newton direction. The full method is described in Algorithm 2.

4 Numerical examples

4.1 Illustration of $\gamma\to 0$

In our first numerical example we illustrate the how the solutions $\pi^{*}$ of the regularized problem converge for vanishing regularization parameter $\gamma\to 0$ . We generate some marginals, fix a transport cost and compute solutions of the discretized transport problems (35) for a sequence $\gamma_{n}\to 0$ and illustrate the optimal transport plans (and the related regularized transport costs). Our marginals are non-negative functions sampled at equidistant points $x_{i}$ , $y_{i}$ in the interval $[0,1]$ and we used $M=N=400$ and the cost $c_{ij}=(x_{i}-y_{j})^{2}$ is the squared distance between the sampling points. The results are shown in Figure 2. One observes that the optimal transport plans converge to a measure that is singular and is supported on the graph of a monotonically increasing function, exactly as the fundamental theorem of optimal transport ambrosio2013user predicts.

We repeat the same experiment where the cost is the (non-squared) distance $c_{ij}=|x_{i}-y_{j}|$ . Here we had to choose larger regularization parameters as it turned out that values similar to Figure 2 would lead to almost undistinguishable results. The results are shown in Figure 3. Note the different structure of the transport plan (which is again in agreement with the predicted results from the fundamental theorem of optimal transport). In Figure 4 we show the results for the concave but increasing cost $c_{ij}=\sqrt{|x_{i}-y_{j}|}$ and again observe the expected effect that a concave transport cost encouraged that as much mass as possible stays in place (as can be seen by the concentration of mass along the diagonal of the transport plan).

4.2 Mesh independence and comparsion of SSN and NLGS

While we did not analyze our algorithms in the continuous case, we made an experiment to see how the methods converge when we change the mesh size of the discretization. To that end, we did a simple piecewise constant approximation of the marginals, the cost and the transport plan as described in Appendix A. This derivation shows that one has to scale up the marginals for finer discretization (or, equivalently, scale down the regularization parameter $\gamma$ ) to get consistent results. We also took care to adapt the termination criteria so that we terminate the algorithms when the continuous counterpart of the termination criteria is satisfied (again, see Table 1 in Appendix A for details).

We used marginals $\mu^{\pm}:[0,1]\to[0,\infty[$ of the form

[TABLE]

with varying $m,m_{1},m_{2}>0,\ 0<a,a_{1},a_{2}<1$ and appropriate normalization factors $r,s$ and quadratic cost $c(x,y)=(x-y)^{2}$ and discretized each instance of the problem with $M=N$ varying from $10$ to $1,000$ . We solved the problem for each size for regularization parameter $\gamma=0.001$ with the semismooth Newton method from Algorithm 2 (with parameters $\epsilon=10^{-6}$ and Armijo parameters $\kappa=0.5$ and $\theta=0.1$ ) up to tolerance $10^{-3}$ and report the number of iterations needed in Figure 5. As can be observed, the number of iterations is comparable for each instance of the problem. Moreover, it seems that the number of iterations does not grow with finer discretization (however, the number of iterations seems to oscillate unpredictable for coarse discretization). The would hint at mesh independence of the method and one could hope to prove this is future research. We performed a similar experiment for the nonlinear Gauss-Seidl method from Algorithm 1 (with larger regularization parameter $\gamma=0.05$ and only up to $M=N=500$ and show the results in Figure 6. We see an overall increase of the number of iterations but only very slightly (with several instances where the number of iterations does not increasing with finer discretization).

4.3 Optimal transport between empirical distributions

As an example in two space dimensions, we consider two distributions $\mu,\nu$ . Instead of using these as marginals, we consider empirical distributions, i.e. we generate samples $(x_{i})_{i=1,\dots,N}$ , sampled from $\mu$ and $(y_{j})_{j=1,\dots,M}$ , sampled from $\nu$ . These samples give empirical approximations

[TABLE]

The optimal transport problem (1) with these two marginals does no fulfill Assumption 1, since the marginals are not $L^{2}$ -functions. However, we can consider it as a discrete problem optimal transport problem in the form (35) when we denote $c_{ij}=c(x_{i},y_{j})$ (for some cost $c$ ) and marginals $\mathbf{1}_{M}$ and $\mathbf{1}_{N}$ , respectively. We solve this discrete optimal transport problem and obtain a transport plan $\pi^{*}$ . Since we use quadratic regularization, the plan will be sparse and hence, we can visualize it by plotting arrows from $x_{i}$ to $y_{j}$ and we make the thickness of the arrows proportional to the size of the entry $\pi^{*}_{ij}$ . In other words: The thickness of the arrow from $x_{i}$ to $y_{j}$ indicates how much of the mass in $x_{i}$ has been transported to $y_{j}$ . In Figure 7 we show the result for $N=80$ samples from an anisotropic Gaussian distribution (centered at the origin) and $M=120$ samples from a uniform distribution on a segment of an annulus. We used $c(x_{i},y_{j})=\|x_{i}-y_{j}\|^{2}$ with the Euclidean norm and regularization paramater $\gamma=1$ . The resulting plan $\pi^{*}$ has $212$ non-zero entries. For a comparison we show the result of entropically regularized optimal transport in the same situation in Figure 8. We used $\gamma=0.05$ (which is the smallest value for which our naive implementation of Sinkhorn algorithms is still stable). The resulting plan has $6730$ nonzero entries and we only plot lines for the transport which are larger than 1% of the largest entry in the optimal transport plan.

5 Conclusion

We analyzed the quadratically regularized optimal transport problem in Kantorovich form. While it is straight forward to derive the dual problem, our proof of existence of dual optima is quite intricate. We note that we are not aware of any proof of existence of the dual of other regularized transport problems in the continuous case besides the very recent clason2019entropic for entropic regularization. We derived two algorithms to solve the dual problems, both of which converge by standard results. It turns out that the semismooth quasi-Newton methods converges fast in all cases and that it behaves stably with respect to the regularization parameter in our numerical experiments. We even observe mesh independence of the method in the experiments. One drawback of the semismooth Newton method is (compared with, e.g. the Sinkhorn iteration cuturi2013sinkhorn ), is that we need to assemble the Newton matrix in each step. While this matrix is usually very sparse, one still needs to check $MN$ cases, which may be too large for large scale problems. We did not investigate, how special structure of the cost function $c$ may help to reduce the cost to assemble the sparse matrix $\sigma$ .

Acknowledgements.

We would like to thank the reviewer for helpful suggestions that lead to an improvement presentation and also Stephan Walther (TU Dortmund) for helping with the construction of the counterexample in Section 2.3.

Appendix A Discretization with piecewise-constant ansatz functions

For sake of brevity, we just consider an equidistant discretization of $[0,1]$ into $N$ intervals using piecewise constant ansatz functions, i.e.

[TABLE]

for coefficients $\pi_{ij}$ and assume analogous definitions for the quantities $c$ , $\mu^{+}$ , $\mu^{-}$ , $\alpha$ and $\beta$ . They have to coincide on average over the intervals. Again, we study this for $\pi$ and obtain that the identity

[TABLE]

holds. Again, analogous identities hold for the quantities $c$ , $\mu^{+}$ , $\mu^{-}$ , $\alpha$ and $\beta$ . The ones with one-dimensional domain are scaled by $\frac{1}{N}$ instead of $\frac{1}{N^{2}}$ .

Now, we consider the discrete Algorithm 2, which operates on discrete quantities and establish a consistent mapping of the quantities from the discretization to the ones of the solver. We denote its input quantities by $\bar{c}_{ij}$ , $\bar{\mu}^{-}_{i}$ , $\bar{\mu}^{+}_{i}$ and its output quantities by $\bar{\alpha}_{i}$ , $\bar{\beta}_{j}$ , $\bar{pi}_{ij}$ , and $\bar{E}$ . It solves for

[TABLE]

which we desire to correspond to

[TABLE]

We plug in the ansatz functions and obtain the identity

[TABLE]

We set $\bar{\pi}_{ij}\coloneqq\gamma\pi_{ij}$ and obtain

[TABLE]

Thus, the choice $\bar{\mu}^{-}_{i}\coloneqq N\mu^{-}_{i}$ gives a consistent conversion. Similarly, we obtain $\bar{\mu}^{+}_{j}\coloneqq N\mu^{+}_{j}$ . We proceed with the objective. Plugging in the ansatz functions into the continuous objective gives

[TABLE]

The solver computes

[TABLE]

Plugging in $N\mu^{-}_{i}=\bar{\mu}^{-}_{i}$ , $N\mu^{+}_{j}=\bar{\mu}^{+}_{j}$ and $\gamma\pi_{ij}=\bar{\pi}_{ij}$ gives

[TABLE]

Thus, the consistent identity $E=\frac{1}{\gamma N^{2}}\bar{E}$ follows if we choose $\bar{\alpha}_{i}\coloneqq\alpha_{i}$ and $\bar{\beta}_{i}\coloneqq\beta_{i}$ . The solver computes $\bar{\alpha}_{i}$ as the solution of

[TABLE]

whereas the discretization of the corresponding continuous equation reads

[TABLE]

in terms of the coefficients. Plugging in the choices $\alpha_{i}=\bar{\alpha}_{i}$ , $\beta_{j}=\bar{\beta}_{j}$ , $c_{ij}=\bar{c}_{ij}$ and $N\mu^{-}_{i}=\bar{\mu}^{-}_{i}$ yields equivalence of the latter equation to

[TABLE]

which is equivalent to the equation that is solved by Algorithm 2. The argument for $\bar{\mu}^{+}_{j}$ is carried out analogously.

Regarding termination, the solver checks the criteria

[TABLE]

We only consider the first and plug the identity $\gamma\pi_{ij}=\bar{\pi}_{ij}$ into it, which gives equivalence to

[TABLE]

This in turn is equivalent to

[TABLE]

Moreover, the ansatz functions for $\pi$ and $\mu^{-}$ are constant on $\left(\frac{i}{N},\frac{i+1}{N}\right)$ , which induces equivalence to

[TABLE]

This implies that if the solver terminates, we have

[TABLE]

We summarize the choices for the consistent mapping of quantities arising from the discretization to quantities the solver operates on in Table 1.

Finally, we make a note on the calculation of the coefficients $c_{ij}$ for the cost function $c(x,y)\coloneqq(x-y)^{2}$ :

[TABLE]

Conflict of Interest: The authors declare that they have no conflict of interest.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In Modelling and optimisation of flows on networks , pages 1–155. Springer, 2013.
2[2] Dimitri P. Bertsekas. Nonlinear programming . Athena Scientific Optimization and Computation Series. Athena Scientific, Belmont, MA, third edition, 2016.
3[3] Mathieu Blondel, Vivien Seguy, and Antoine Rolet. Smooth and sparse optimal transport. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics , volume 84 of Proceedings of Machine Learning Research , pages 880–889, Playa Blanca, Lanzarote, Canary Islands, 09–11 Apr 2018. PMLR.
4[4] Guillaume Carlier, Vincent Duval, Gabriel Peyré, and Bernhard Schmitzer. Convergence of entropic schemes for optimal transport and gradient flows. SIAM Journal on Mathematical Analysis , 49(2):1385–1418, 2017.
5[5] Xiaojun Chen. Superlinear convergence of smoothing quasi-newton methods for nonsmooth equations. Journal of Computational and Applied Mathematics , 80(1):105 – 126, 1997.
6[6] Xiaojun Chen. On convergence of SOR methods for nonsmooth equations. Numer. Linear Algebra Appl. , 9(1):81–92, 2002.
7[7] Xiaojun Chen, Zuhair Nashed, and Liqun Qi. Smoothing methods and semismooth methods for nondifferentiable operator equations. SIAM J. Numer. Anal. , 38(4):1200–1216, 2000.
8[8] Christian Clason, Dirk A Lorenz, Hinrich Mahler, and Benedikt Wirth. Entropic regularization of continuous optimal transport problems. ar Xiv preprint ar Xiv:1906.01333 , 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Quadratically regularized optimal transport

Abstract

Keywords:

MSC:

1 Introduction

Notation.

2 Quadratic regularization in the continuous case

2.1 Solutions of the primal problem

Lemma 1

Proof

2.2 Dual problem and existence of dual solutions

Remark 1

Assumption 1

Remark 2

Remark 3

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Proposition 1

Proof

Theorem 2.1

Proof

Theorem 2.2 (Necessary and Sufficient Optimality Conditions for (1))

2.3 Regularization of the dual problem

Proposition 2

Proof

Theorem 2.3

Proof

2.4 The discrete dual problem

Lemma 6

3 Algorithms

3.1 Non-linear Gauss-Seidel

3.2 Semismooth Newton

Lemma 7

Lemma 8

Proof

4 Numerical examples

4.1 Illustration of γ→0\gamma\to 0γ→0

4.2 Mesh independence and comparsion of SSN and NLGS

4.3 Optimal transport between empirical distributions

5 Conclusion

Acknowledgements.

Appendix A Discretization with piecewise-constant ansatz functions

4.1 Illustration of $\gamma\to 0$