The boundary method for semi-discrete optimal transport partitions and   Wasserstein distance computation

Luca Dieci; J.D. Walsh III

arXiv:1702.03517·math.NA·May 3, 2019·J. Comput. Appl. Math.

The boundary method for semi-discrete optimal transport partitions and Wasserstein distance computation

Luca Dieci, J.D. Walsh III

PDF

TL;DR

The paper introduces the boundary method, a new technique for efficiently solving semi-discrete optimal transport problems across various cost functions, with theoretical backing and practical testing.

Contribution

It presents the boundary method that reduces problem complexity and provides convergence analysis for p-norm cost functions, extending applicability to general costs.

Findings

01

Effective reduction in problem dimension

02

Convergence proven for p-norm costs

03

Successful testing on various cost functions

Abstract

We introduce a new technique, which we call the boundary method, for solving semi-discrete optimal transport problems with a wide range of cost functions. The boundary method reduces the effective dimension of the problem, thus improving complexity. For cost functions equal to a p-norm with p in (1,infinity), we provide mathematical justification, convergence analysis, and algorithmic development. Our testing supports the boundary method with these p-norms, as well as other, more general cost functions.

Tables13

Table 1. Table 1: Closed-form options for μ 𝜇 \mu

$\hat{μ} ((x_{1}, x_{2})) = 1$		$\begin{matrix} \hat{M} (u, v) = u v \end{matrix}$
$\hat{μ} ((x_{1}, x_{2})) = x_{1}^{t} x_{2}^{t}$ ,	$t > 0$	$\begin{matrix} \hat{M} (u, v) = {(t + 1)}^{- 2} (u^{t + 1} v^{t + 1}) \end{matrix}$
$\hat{μ} ((x_{1}, x_{2})) = e^{t x_{1}}$ ,	$t \neq 0$	$\begin{matrix} \hat{M} (u, v) = t^{- 1} v e^{t u} \end{matrix}$
$\hat{μ} ((x_{1}, x_{2})) = e^{t x_{2}}$ ,	$t \neq 0$	$\begin{matrix} \hat{M} (u, v) = t^{- 1} u e^{t v} \end{matrix}$

Table 2. Table 2: Closed-form options for C 𝐶 C when μ 𝜇 \mu is uniform or zero on A 𝐴 A

$c$	$\hat{C} (u, v)$
$2$ -norm	$\begin{matrix} {\begin{matrix} \frac{1}{6} u^{3} \log (\sqrt{u^{2} + v^{2}} + v) \\ + \frac{1}{3} u v \sqrt{u^{2} + v^{2}} & if (u, v) \neq 𝟎 \\ + \frac{1}{6} v^{3} \log (\sqrt{u^{2} + v^{2}} + u) \\ 0 & if (u, v) = 𝟎 \end{matrix} \end{matrix}$
$p$ -th power $p$ -norm	$\begin{matrix} {(p + 1)}^{- 1} (u^{p + 1} v + u v^{p + 1}) \end{matrix}$

Table 3. Table 3: Wasserstein errors for the NW-SE, 4 × 4 4 4 4\times 4 , and “bad” 1 1 1 -norm problems

$w_{*}$	abs. error
$2^{- 9^{}}$	$8.42 \times 10^{- 6}$
$2^{- 10}$	$2.11 \times 10^{- 6}$
$2^{- 11}$	$5.27 \times 10^{- 7}$
$2^{- 12}$	$1.32 \times 10^{- 7}$
$2^{- 13}$	$3.30 \times 10^{- 8}$
$2^{- 14}$	$8.24 \times 10^{- 9}$
$2^{- 15}$	$2.06 \times 10^{- 9}$
$2^{- 16}$	$5.15 \times 10^{- 10}$
$2^{- 17}$	$1.29 \times 10^{- 10}$

Table 4. (a) NW-SE errors

$w_{*}$	abs. error
$2^{- 9^{}}$	$8.42 \times 10^{- 6}$
$2^{- 10}$	$2.11 \times 10^{- 6}$
$2^{- 11}$	$5.27 \times 10^{- 7}$
$2^{- 12}$	$1.32 \times 10^{- 7}$
$2^{- 13}$	$3.30 \times 10^{- 8}$
$2^{- 14}$	$8.24 \times 10^{- 9}$
$2^{- 15}$	$2.06 \times 10^{- 9}$
$2^{- 16}$	$5.15 \times 10^{- 10}$
$2^{- 17}$	$1.29 \times 10^{- 10}$

Table 5. (b) 4 × 4 4 4 4\times 4 errors

$w_{*}$	abs. error
$2^{- 9^{}}$	$2.02 \times 10^{- 5}$
$2^{- 10}$	$5.04 \times 10^{- 6}$
$2^{- 11}$	$1.26 \times 10^{- 6}$
$2^{- 12}$	$3.15 \times 10^{- 7}$
$2^{- 13}$	$7.88 \times 10^{- 8}$
$2^{- 14}$	$1.97 \times 10^{- 8}$
$2^{- 15}$	$4.93 \times 10^{- 9}$
$2^{- 16}$	$1.23 \times 10^{- 9}$
$2^{- 17}$	$3.08 \times 10^{- 10}$

Table 6. (c) “Bad” 1 1 1 -norm errors

$w_{*}$	abs. error
$2^{- 9^{}}$	$8.66 \times 10^{- 6}$
$2^{- 10}$	$2.16 \times 10^{- 6}$
$2^{- 11}$	$5.40 \times 10^{- 7}$
$2^{- 12}$	$1.35 \times 10^{- 7}$
$2^{- 13}$	$3.32 \times 10^{- 8}$
$2^{- 14}$	$8.26 \times 10^{- 9}$
$2^{- 15}$	$2.06 \times 10^{- 9}$
$2^{- 16}$	$5.15 \times 10^{- 10}$
$2^{- 17}$	$1.29 \times 10^{- 10}$

Table 7. Table 4: Wasserstein approximation behavior with respect to w ∗ subscript 𝑤 w_{*}

$c$	${\tilde{P}}_{16}^{*}$	${err}_{\max} (w_{*})$	$Δ {\tilde{P}}_{16}^{} (w_{})$
$1$ -norm	$0.25702262181$	$0.457 {(w_{*})}^{1.020}$	$1.186 {(w_{*})}^{2.025}$
$2$ -norm	$0.20754605961$	$0.361 {(w_{*})}^{1.008}$	$4.151 {(w_{*})}^{2.023}$
squared $2$ -norm	$0.05290682486$	$0.221 {(w_{*})}^{1.008}$	$2.668 {(w_{*})}^{2.029}$

Table 8. Table 5: Planar scaling with respect to W 𝑊 W and N 𝑁 N

	$N = 5$
$W$	T (sec)	S (MB)
$2^{12}$	$0.855$	$24.540$
$2^{13}$	$2.005$	$49.100$
$2^{14}$	$4.497$	$98.210$
$2^{15}$	$11.025$	$196.400$
$2^{16}$	$28.093$	$394.400$
$2^{17}$	$60.577$	$785.800$
$2^{18}$	$132.397$	$1571.840$
$2^{19}$	$292.158$	$3151.872$
$2^{20}$	$640.660$	$6309.888$

Table 9. (a) Scaling with respect to W 𝑊 W

	$N = 5$
$W$	T (sec)	S (MB)
$2^{12}$	$0.855$	$24.540$
$2^{13}$	$2.005$	$49.100$
$2^{14}$	$4.497$	$98.210$
$2^{15}$	$11.025$	$196.400$
$2^{16}$	$28.093$	$394.400$
$2^{17}$	$60.577$	$785.800$
$2^{18}$	$132.397$	$1571.840$
$2^{19}$	$292.158$	$3151.872$
$2^{20}$	$640.660$	$6309.888$

Table 10. (b) Scaling with respect to N 𝑁 N

	$W = 2^{10}$		$W = 2^{11}$
$N$	T (sec)	S (MB)	T (sec)	S (MB)
128	16.938	17.25	22.365	33.91
136	12.190	18.24	36.601	35.05
144	10.982	17.99	29.952	36.49
152	13.139	18.54	36.703	41.27
160	11.420	18.66	34.801	40.27
168	15.727	20.97	44.959	40.66
176	15.332	21.38	44.873	43.06
184	18.243	21.38	53.689	43.20
192	12.796	21.60	40.029	43.66

Table 11. Table 6: Time and storage scaling with respect to W 𝑊 W and N 𝑁 N separately

$T (W) \approx 4.356 \times 10^{- 5} W \ln W$	Time	$T (N) \approx 4.582 \times 10^{- 2} N \ln N$
$S (W) \approx 6.015 \times 10^{- 3} W$	Storage	$S (N) \approx 3.162 N^{1 / 2}$

Table 12. Table 7: Time and memory scaling with respect to both W 𝑊 W and N 𝑁 N

Time	$T (N, W) \approx 2.853 \times 10^{- 6} W N \ln W \ln N$
Storage	$S (N, W) \approx 1.538 \times 10^{- 3} W N^{1 / 2}$

Table 13. Table 8: 3-D scaling with respect to W 𝑊 W and N 𝑁 N

$W$ alone	Time	$T (W) \approx 6.878 \times 10^{- 5} W^{2} \ln W$
$W$ alone	Storage	$S (W) \approx 2.341 \times 10^{- 2} W^{2}$
$N$ alone	Time	$T (N) \approx 2.849 \times 10^{- 1} N \ln N$
$N$ alone	Storage	$S (N) \approx 2.315 \times 10^{2} N^{1 / 3}$
$W$ and $N$	Time	$T (N, W) \approx 3.531 \times 10^{- 6} W^{2} N \ln W \ln N$
$W$ and $N$	Storage	$S (N, W) \approx 1.397 \times 10^{- 1} W^{2} N^{1 / 3}$

Equations198

\Pi(\mu,\,\nu):=\left\{\pi\in\mathcal{P}(X\times Y)\left|\begin{array}[]{c}\pi[A\times Y]=\mu[A],\,\pi[X\times B]=\nu[B]\ ,\\ \forall\text{ meas.\ }A\subseteq X,\,B\subseteq Y\end{array}\right.\right\},

\Pi(\mu,\,\nu):=\left\{\pi\in\mathcal{P}(X\times Y)\left|\begin{array}[]{c}\pi[A\times Y]=\mu[A],\,\pi[X\times B]=\nu[B]\ ,\\ \forall\text{ meas.\ }A\subseteq X,\,B\subseteq Y\end{array}\right.\right\},

P (π) := \int_{X \times Y} c (x, y) d π (x, y) .

P (π) := \int_{X \times Y} c (x, y) d π (x, y) .

P^{*} := π \in Π (μ, ν) in f P (π),

P^{*} := π \in Π (μ, ν) in f P (π),

π^{*} := π \in Π (μ, ν) arg inf P (π) .

π^{*} := π \in Π (μ, ν) arg inf P (π) .

\Phi_{c}(\mu,\,\nu):=\left\{(\varphi,\,\psi)\in L^{1}(d\mu)\times L^{1}(d\nu)\left|\begin{array}[]{c}\varphi(\mathbf{x})+\psi(\mathbf{y})\leq c(\mathbf{x},\,\mathbf{y})\ ,\\ d\mu\text{ a.e.\ }\mathbf{x}\in X,\,d\nu\text{ a.e.\ }\mathbf{y}\in Y\end{array}\right.\right\}.

\Phi_{c}(\mu,\,\nu):=\left\{(\varphi,\,\psi)\in L^{1}(d\mu)\times L^{1}(d\nu)\left|\begin{array}[]{c}\varphi(\mathbf{x})+\psi(\mathbf{y})\leq c(\mathbf{x},\,\mathbf{y})\ ,\\ d\mu\text{ a.e.\ }\mathbf{x}\in X,\,d\nu\text{ a.e.\ }\mathbf{y}\in Y\end{array}\right.\right\}.

D (φ, ψ) := \int_{X} φ d μ + \int_{Y} ψ d ν .

D (φ, ψ) := \int_{X} φ d μ + \int_{Y} ψ d ν .

D^{*} := (φ, ψ) \in Φ_{c} (μ, ν) sup D (φ, ψ),

D^{*} := (φ, ψ) \in Φ_{c} (μ, ν) sup D (φ, ψ),

(φ^{*}, ψ^{*}) := (φ, ψ) \in Φ_{c} (μ, ν) arg sup D (φ, ψ) .

(φ^{*}, ψ^{*}) := (φ, ψ) \in Φ_{c} (μ, ν) arg sup D (φ, ψ) .

W_{1} (μ, ν) := π \in Π (μ, ν) in f \int_{X \times Y} c (x, y) d π (x, y) .

W_{1} (μ, ν) := π \in Π (μ, ν) in f \int_{X \times Y} c (x, y) d π (x, y) .

π^{*} (x, y) = π_{T^{*}}^{*} (x, y) := μ (x) δ [y = T^{*} (x)],

π^{*} (x, y) = π_{T^{*}}^{*} (x, y) := μ (x) δ [y = T^{*} (x)],

π (x, y) = π_{T} (x, y) := μ (x) δ [y = T (x)],

π (x, y) = π_{T} (x, y) := μ (x) δ [y = T (x)],

P (π) := \int_{X} c (x, T (x)) d μ (x) .

P (π) := \int_{X} c (x, T (x)) d μ (x) .

P (π) := i = 1 \sum n \int_{A_{i}} c (x, y_{i}) d μ (x) .

P (π) := i = 1 \sum n \int_{A_{i}} c (x, y_{i}) d μ (x) .

F (x) := 1 \leq i \leq n max {a_{i} - c (x, y_{i})} .

F (x) := 1 \leq i \leq n max {a_{i} - c (x, y_{i})} .

A_{i} := {x \in A ∣ F (x) = a_{i} - c (x, y_{i})} .

A_{i} := {x \in A ∣ F (x) = a_{i} - c (x, y_{i})} .

φ^{'} (x) = y \in Y sup {ψ (y) - c (x, y)} .

φ^{'} (x) = y \in Y sup {ψ (y) - c (x, y)} .

- \nabla \cdot (a \nabla u) = f, where ∣ \nabla u ∣ \leq 1, a \geq 0, and ∣ \nabla u ∣ < 1 ⟹ a = 0.

- \nabla \cdot (a \nabla u) = f, where ∣ \nabla u ∣ \leq 1, a \geq 0, and ∣ \nabla u ∣ < 1 ⟹ a = 0.

A_{ij} := A_{i} \cap A_{j} .

A_{ij} := A_{i} \cap A_{j} .

B := 1 \leq i < n ⋃ i < j \leq n ⋃ A_{ij},

B := 1 \leq i < n ⋃ i < j \leq n ⋃ A_{ij},

\overset{˚}{A}_{i} := A_{i} ∖ B .

\overset{˚}{A}_{i} := A_{i} ∖ B .

g_{ij} (x) := c (x, y_{i}) - c (x, y_{j}) .

g_{ij} (x) := c (x, y_{i}) - c (x, y_{j}) .

g_{ij} (x) = a_{i} - a_{j}, \forall x \in A_{ij} .

g_{ij} (x) = a_{i} - a_{j}, \forall x \in A_{ij} .

a_{ij} := g_{ij} (x_{ij}) \forall x_{ij} \in A_{ij} .

a_{ij} := g_{ij} (x_{ij}) \forall x_{ij} \in A_{ij} .

P_{R} := \int_{R} c (x, y_{i}) d μ (x),

P_{R} := \int_{R} c (x, y_{i}) d μ (x),

edg (A^{r}) := {x \in A^{r} ∣ μ (x) > 0 and \exists x_{n} \in N (x) such that μ (x_{n}) = 0} .

edg (A^{r}) := {x \in A^{r} ∣ μ (x) > 0 and \exists x_{n} \in N (x) such that μ (x_{n}) = 0} .

int (A^{r}) := {x \in A^{r} ∣ μ (x) > 0 and μ (x_{n}) > 0 for all x_{n} \in N (x)} .

int (A^{r}) := {x \in A^{r} ∣ μ (x) > 0 and μ (x_{n}) > 0 for all x_{n} \in N (x)} .

g_{ij} (x)

g_{ij} (x)

g_{ij} (x)

g_{ij} (x)

a_{i} - c (x, y_{i}) = F (x) \geq a_{j} - c (x, y_{j}) .

a_{i} - c (x, y_{i}) = F (x) \geq a_{j} - c (x, y_{j}) .

c (x, y_{i}) - c (x, y_{j}) \leq a_{i} - a_{j} .

c (x, y_{i}) - c (x, y_{j}) \leq a_{i} - a_{j} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The boundary method for semi-discrete optimal transport partitions and

Wasserstein distance computation111This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1650044. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Luca Dieci

School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332 U.S.A.

Tel.: +1 404-894-9209 Fax: +1 404-894-4409

[email protected]

J.D. Walsh III

Naval Surface Warfare Center, Panama City Division (X24), 110 Vernon Ave., Panama City, FL 32407 U.S.A.

Tel.: +1 850-234-4660 Fax: +1 850-235-5374

[email protected]

Abstract

We introduce a new technique, which we call the boundary method, for solving semi-discrete optimal transport problems with a wide range of cost functions. The boundary method reduces the effective dimension of the problem, thus improving complexity. For cost functions equal to a $p$ -norm with $p\in(1,\infty)$ , we provide mathematical justification, convergence analysis, and algorithmic development. Our testing supports the boundary method with these $p$ -norms, as well as other, more general cost functions.

keywords:

Optimal transport , Monge-Kantorovich , semi-discrete , Wasserstein distance , boundary method

MSC:

65K10 , 35J96 , 49M25

label=(), ref=(0)-()-()

1 Introduction

In this work, we consider a new solution method for optimal transport problems. Numerical optimal transport has applications in a wide range of fields, but the scaling properties and ground cost restrictions of current numerical methods make it difficult to find solutions for many applications.

The boundary method we propose focuses on a broad class of optimal transportation problems: semi-discrete optimal transport. Many other techniques assume semi-discrete transport, either implicitly or explicitly, as semi-discrete formulations can be used to approximate solutions to fully continuous problems, and the semi-discrete optimal transport problem is of practical relevance itself.

Key challenges in numerical optimal transport are: (a) the design of numerical methods capable of handling general ground costs, (b) efficient computation of the Wasserstein metric, and (c) solutions of three (or higher) dimensional problems. The boundary method addresses these concerns by solving problems where the ground cost is a $p$ -norm, $p\in(1,\,\infty)$ , and by doing so in a way that reduces the effective dimension of the transport problem.

1.1 Description of optimal transport: the

Monge-Kantorovich problem

The theory of optimal transport dates back to the work by Monge in 1781, [1]. In the 1940s, Kantorovich’s papers [2, 3] relaxed Monge’s requirement that no mass be split, creating we now know as the Monge-Kantorovich problem.

Definition 1.1 (Monge-Kantorovich problem)

Let $X,\,Y\subseteq\mathbb{R}^{d}$ , let $\mu$ and $\nu$ be probability densities defined on $X$ and $Y$ , and let $c(\mathbf{x},\,\mathbf{y}):X\times Y\to\mathbb{R}$ be a measurable ground cost function. Define the set of transport plans

[TABLE]

where $\mathcal{P}(X\times Y)$ is the set of probability measures on the product space, and define the primal cost function $P:\Pi(\mu,\,\nu)\to\mathbb{R}$ as

[TABLE]

The Monge-Kantorovich problem is to find the optimal primal cost

[TABLE]

and an associated optimal transport plan

[TABLE]

Kantorovich also identified the problem’s dual formulation.

Definition 1.2 (Dual formulation)

Define the set of functions

[TABLE]

Let the dual cost function, $D:\Phi_{c}(\mu,\,\nu)\to\mathbb{R}$ , be defined as

[TABLE]

Then, the optimal dual cost is

[TABLE]

and an optimal dual pair is given by

[TABLE]

When the ground cost is a distance function (often but not necessarily Euclidean), Monge-Kantorovich solutions are related to the Wasserstein metric, a distance between probability distributions:

[TABLE]

We have $W_{1}(\mu,\,\nu)=P^{*}=D^{*}$ , and hence, we may refer to any of these as the Wasserstein distance, the optimal transport cost, or simply the optimal cost.222See also [4, p. 207], a definition of the Wasserstein metric $W_{p}$ with $p\in[0,\,\infty)$ .

Remark 1

$W_{1}(\mu,\,\nu)$ * is often written as $W_{1}$ , with $\mu$ and $\nu$ implied. Furthermore, as Equation (1.9) makes clear, $W_{1}(\mu,\,\nu)$ also depends on the ground cost function $c(\mathbf{x},\,\mathbf{y})$ . In the literature, the Wasserstein distance formula often assumes the ground cost to be a specific predetermined function, usually the Euclidean distance $\lVert\mathbf{x}-\mathbf{y}\rVert_{2}$ .*

Definition 1.3 (Monge problem)

In certain cases, there exists at least one solution to the semi-discrete Monge-Kantorovich problem that does not split transported masses. In other words, there exists some $\pi^{*}$ such that

[TABLE]

where $T^{*}:X\to Y$ is a measurable map called optimal transport map.333One can also write $\pi^{*}_{T^{*}}$ as $(\mathrm{Id}\times T^{*})\#\mu$ . Our notation is from [4, p. 3]. The alternative notation is used in [5]. When such a $\pi^{*}$ exists, we say the solution also solves the Monge problem.

If the Monge-Kantorovich problem has a solution which solves the Monge problem, we can assume without loss of generality that every $\pi\in\Pi(\mu,\,\nu)$ satisfies

[TABLE]

for some measurable transport map $T:X\to Y$ , and that the primal cost can be written

[TABLE]

1.2 Semi-discrete problem

The semi-discrete optimal transport problem we consider is the Monge-Kantorovich problem of Definition 1.1, with restrictions on $\mu$ and $\nu$ , and $c$ .

(1)

Assume that $\mu$ satisfies the following:

(a)

$\mu$ is absolutely continuous with respect to the Lebesgue measure.

(b)

The support of $\mu$ is contained in the convex compact region $A\subseteq X$ .

(Since $A\subset\mathbb{R}^{d}$ , it must also be the case that $A$ is simply connected.)

(2)

Assume $\nu$ has exactly $n\geq 2$ non-zero values, located at $\{\mathbf{y}_{i}\}_{i=1}^{n}\subseteq Y$ .

(3)

Assume $c$ is a $p$ -norm with $p\in(1,\,\infty)$ .

As we will show, each of these conditions is required for one or more of the theorems given in Section 3. Condition (1)(1)(a) ensures that the value of $\mu$ is bounded, which is required to show Wasserstein distance convergence in Theorem 3.25. Conditions (1)(1)(a), (1)(1)(b), (2), and (3) are all used to satisfy the conditions of Corollary 4444See Theorem 3.6, below, for a full statement of this result. of [6], which we apply to show the $\mu$ -a.e. uniqueness of the solution in Theorem 3.7.

1.2.1 Semi-discrete transport

and the Monge problem

Since $\mu$ is absolutely continuous, $\lvert S\rvert=0$ implies $\mu(S)=0$ for all Borel sets $S$ in $X$ . Hence, $\mu$ is nonatomic. Because $c$ is continuous and $\mu$ is nonatomic, at least one solution to the semi-discrete Monge-Kantorovich problem also satisfies the Monge problem, described in Definition 1.3; see Theorem B in [7]. Thus, by applying Equation (1.11), we can assume without loss of generality that any transport plan $\pi$ partitions $A$ into $n$ sets $A_{i}$ , where $A_{i}$ is the set of points in $A$ that are transported by the map $T$ to $\mathbf{y}_{i}$ . Using this partitioning scheme in combination with Equation (1.12) allows us to rewrite the primal cost function for the semi-discrete problem as

[TABLE]

1.3 Shift characterization for semi-discrete

optimal transport

Using this idea of sets $A_{i}$ , we are ready to describe the shift characterization of the semi-discrete optimal transport problem. The definition of the characterization, which follows, is based on one given by Rüschendorf and Uckelmann in [8, 9].

Definition 1.4 (Shift characterization)

Let $\{a_{i}\}_{i=1}^{n}$ be a set of $n$ finite values, referred to as shifts. Define

[TABLE]

For $i\in\mathbb{N}_{n}$ , where $\mathbb{N}_{n}=\{1,\,\ldots,\,n\}$ , let

[TABLE]

Note that $\cup_{i=1}^{n}A_{i}=A$ . The problem of determining an optimal transport plan $\pi^{*}$ is equivalent to determining shifts $\{a_{i}\}_{i=1}^{n}$ such that for all $i\in\mathbb{N}_{n}$ , the total mass transported from $A_{i}$ to $\mathbf{y}_{i}$ equals $\nu(\mathbf{y}_{i})$ .

The shift characterization is derived from the dual cost function given in Equation (1.6). For any $D(\varphi,\,\psi)$ , suppose we define

[TABLE]

Then $D(\varphi^{\prime},\,\psi)\geq D(\varphi,\,\psi)$ for all $\psi$ .

For the semidiscrete problem, $\varphi^{\prime}$ is exactly Equation (1.14), and the shifts $a_{i}$ correspond to the value of $\psi$ at each Dirac mass $\mathbf{y}_{i}$ . Hence, the discrete problem is no more than a special case of the general continuous problem where $\mu$ is a continuous density function and $\nu$ an empirical measure. For a detailed derivation, see [5].

In the same way, the sets $A_{i}$ correspond to the subdifferentials $\partial_{c}(\mathbf{y}_{i})$ . For a general cost function $c$ , the sets $A_{i}$ are referred to in analysis as Laguerre cells, and the map generated by the sets $A_{i}$ over $A$ is called a Laguerre diagram. As we will discuss further on, the boundaries between Laguerre cells are typically sections of hypersurfaces. When $c(\mathbf{x},\,\mathbf{y})=\lVert\mathbf{y}-\mathbf{x}\rVert_{2}^{2}$ , the boundaries are sections of hyperplanes, and the map as called a power diagram. See [10] for a detailed evaluation of this special case. There are also cost functions where, for certain arrangements of $\{\mathbf{y}_{i}\}$ , the boundaries between Laguerre cells have positive Lebesgue measure in $\mathbb{R}^{d}$ . An example is shown in Figure 4LABEL:sub@f:badManhattan.

1.4 Numerical approaches to the MK

problem

Applications of optimal transport are found in many areas of research, including medicine, economics, image processing, machine learning, physics, and many others; e.g., see [11, 12, 13, 14, 15]. For that reason, many people have focused their research on numerical methods for the Monge-Kantorovich problem.

The solution to a semi-discrete problem can be approximated by treating the problem as fully discrete, and the solution to a fully continuous problem can be approximated by treating it as either semi- or fully discrete. By “treating,” we refer primarily to assumptions about continuity: in practice, nearly every approach fully discretizes the problem, and the complexity of such approaches is relative to the measure of the discretization.

The semi-discrete problem has received significant attention in its role as a discretization of the continuous problem (where continuity assumptions are employed over $X$ but not $Y$ ). Substantial effort has been taken to quantify the extent to which solutions to such semi-discrete problems approximate the solution to the original continuous problem; for example, see [16]. However, the semi-discrete problem has interesting applications in its own right. Recent developments include works in economics [17, 18, 19], image processing [20], and optics [21, 22]. In addition, the power and flexibility of Laguerre cell tesselation (vs. Voronoi) drive ongoing research in physics and other fields.

When the ground cost for the semi-discrete problem is the squared $2$ -norm, $\lVert\cdot\rVert_{2}^{2}$ , significant numerical progress has been achieved. In 1988, Oliker and Prussner introduced what came to be called the Oliker-Prussner algorithm for nonlinear Monge-Ampère-type equations in $\mathbb{R}^{2}$ ; see [23]. Oliker and Prussner were significantly ahead of their time. A 1992 paper by Aurenhammer et al., [24], while describing a different algorithm (Newton’s method), explicitly connected the Oliker and Prussner’s approach to semi-discrete transport and its resulting “Voronoi-type diagrams.” In 1998 Aurenhammer et al. published [25], a revision that clarified important details, and incorporated an argument from [6] to guarantee that the sets $A_{i}$ partition $A$ $\mu$ -a.e. More recent algorithms appear in [26, 16].

When sets $A_{i}$ and $A_{j}$ share a boundary, for some $i\neq j$ , there is a monotone relationship between the volume of $A_{i}$ and the difference of shifts, $a_{i}-a_{j}$ . The Oliker-Prussner approach and the boundary method both exploit this relationship, though in very different ways. Whether applying the Oliker-Prussner algorithm or some variation such as Newton’s method, the Oliker-Prussner approach begins with approximated sets $\tilde{A}_{i}$ , and directly perturbs the approximated shift difference $\tilde{a}_{i}-\tilde{a}_{j}$ in order to bring $\mu(\tilde{A}_{i})$ closer to $\nu(\mathbf{y}_{i})$ . This approach is extended over all the shift differences,555They refer to a set of shift differences $\{a_{i}-a_{j}\mid i,\,j\in\mathbb{N}_{n},\,i<j\}$ as a weight vector. making it, in essence, a method for solving the Monge-Kantorovich dual problem with $c=\lVert\cdot\rVert_{2}^{2}$ . Because the squared $2$ -norm is strictly convex, and it ensures that the boundary for each adjacent $A_{i}$ and $A_{j}$ is a hyperplane, algorithms based on the Oliker-Prussner approach are generally able to quantify convergence behavior and guarantee termination after a finite number of refinement steps.

Numerous efforts have been made to extend the approach proposed by Oliker and Prussner. An application-focused paper by Caffarelli et al. extends the Oliker-Prussner algorithm to $\mathbb{R}^{3}$ , assuming special geometries [27]. Lévy presents a parallelized Newton’s method for three dimensions, one which scales well when $Y$ consists of large numbers of Dirac masses [28]. Other works, such as [29], attempt to integrate the Oliker-Prussner approach with the Wide Stencil methods developed for continuous Monge-Ampère problems; see, e.g., [30, 31]. All of these assume $c=\lVert\cdot\rVert_{2}^{2}$ .

A few authors have attempted to develop approaches for ground costs other than the squared $2$ -norm. Most of these do not employ Oliker-Prussner. In [9], Rüschendorf and Uckelmann report on numerical experiments with ground costs given by the Euclidean distance taken to the powers $2$ , $3$ , $4$ , and $10$ . They assume that $\mu$ is the uniform distribution, and test various weights and placements for the set $\{\mathbf{y}_{i}\}_{i=1}^{n}$ . When an exact solution cannot be directly determined, they fully discretize the problem and use a linear programming solver.

In [32], Schmitzer works with cost functions $c=\lVert\cdot\rVert_{2}^{p}$ for $p\in(1,\,\infty)$ , and applies a form of adaptive scaling done by “shielding” regions: his method attempts to determine points of influence in order to solve primarily local problems. He restricts his examples to $\mathbb{R}^{2}$ .

Solving the semi-discrete problem for the $2$ -norm is discussed in [33].666In [33], the partition of $A$ is called an “optimal coupling.” Starting with an alternative form of Equation (1.17), taken from [34], Barrett and Prigozhin develop a mixed formulation of the Monge-Kantorovich problem, which they solve using a standard finite element discretization.

Kitagawa’s 2014 paper, [35], offers a potentially broad generalization of the Oliker-Prussner algorithm, which works for ground costs other than $\lVert\cdot\rVert_{2}^{2}$ , provided those ground costs satisfy strict conditions, including Strong Ma-Trudinger-Wang; see also [36]. His proposals, while densely theoretical, do not include numerics or an explicit iterative scheme.

As [26] states, the special case $c=\lVert\cdot\rVert_{2}^{2}$ has two methods specifically designed for solving semi-discrete problems directly: the Oliker-Prussner algorithm and the damped Newton methods proposed in papers like [25]. Both rely on some variant of what we call the Oliker-Prussner approach, described above. However, approaches developed for fully discrete or continuous transport can also be applied to the semi-discrete problems, though with varying degrees of effectiveness. Rüschendorf and Uckelmann apply a discrete linear program solver in [9], and the solver Barrett and Prigozhin use in [33] was developed for continuous transport.

Discrete methods assume a fully discrete $(X,\,\mu)$ and $(Y,\,\nu)$ , and solve the resulting minimization problem using network flow minimization techniques. As described in [37], there are over 20 established methods for solving such problems, and at least seven software packages capable of handling one or more of these methods.

Most approaches to the fully continuous Monge-Kantorovich problem assume specific ground costs and solve using techniques developed for elliptic partial differential equations, particularly those of the Monge-Ampère-type:

[TABLE]

If the ground cost function is strictly convex, or otherwise satisfies the Ma-Trudinger-Wang regularity conditions described in [36], such problems are well-posed. To date, the requirements of well-posedness have largely restricted the application of such continuous methods to well-behaved cost functions such as $\lVert\cdot\rVert_{2}^{2}$ or a regularized Euclidean distance. Continuous methods currently in use apply finite difference, gradient descent, or the iterative Bregman projections (a.k.a. Sinkhorn-Knopp) algorithm, all attempting to map $X$ to a fully discretized $Y$ [38, 39, 40].

As we will show, the boundary method offers a new approach to solving semi-discrete transport, distinct from all of those described above. By and large, the solution methods described above only work for a specific fixed cost, usually $c=\lVert\cdot\rVert_{2}^{2}$ . The boundary method quickly solves problems with more general ground costs. When the ground cost is a $p$ -norm, with $p\in(1,\,\infty)$ , the boundary method provides a global rate of convergence that is proportional to the volume of $A$ .

2 Boundary Method

At a high level, the idea behind the boundary method is simple: track only the boundaries between regions, without resolving the regions’ interiors. To do this in practice and obtain an efficient technique, we must account for the interplay between discretization, a mechanism for discarding interior regions, and a fast solver.

At its heart, the boundary method can be viewed as an adaptive refinement technique, one which focuses on the shared region boundaries. The method discards interior regions, but a well-chosen initial discretization prevents any corresponding loss of accuracy. The boundary method’s strategy progressively refines the boundaries between individual regions $A_{i}$ . Thus, by the method’s very nature, any initial configuration must enclose the boundary in a way that allows it to be distinguished from the region interiors. The necessary conditions for a well-chosen initial discretization are presented in Theorem 3.21 and discussed in detail in Remark 5.

2.1 Boundary identity and system of

equations

For all $i,\,j\in\mathbb{N}_{n}$ such that $i\neq j$ , let

[TABLE]

The boundary set is defined as

[TABLE]

and for each $i\in\mathbb{N}_{n}$ , let the strict interior of $A_{i}$ be defined as

[TABLE]

For all $i,\,j\in\mathbb{N}_{n}$ such that $i\neq j$ , define $g_{ij}:X\to\mathbb{R}$ as

[TABLE]

By Corollary 3.11 below, $B\neq\varnothing$ and for each $\mathbf{x}\in B$ there exist $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ , such that $\mathbf{x}\in A_{ij}$ . Because $\mathbf{x}\in A_{i}$ , we have $F(\mathbf{x})=a_{i}-c(\mathbf{x},\,\mathbf{y}_{i})$ , and because $\mathbf{x}\in A_{j}$ , we have $F(\mathbf{x})=a_{j}-c(\mathbf{x},\,\mathbf{y}_{j})$ . Combining and rearranging these, we get

[TABLE]

Thus, Equation (2.5) implies that $A_{ij}$ is a subset of a level set of $g_{ij}$ ; the value $a_{i}-a_{j}$ is constant, regardless of which $\mathbf{x}\in A_{ij}$ is chosen. Using this information, for each $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ , such that $A_{ij}\neq\varnothing$ , we can define the constant shift difference

[TABLE]

Given a sufficiently large set of linearly independent equations of the form given in Equation (2.6), one could determine most or all of the shifts $\{a_{i}\}_{i=1}^{n}$ . As we show in Theorem 3.13, it is possible to obtain exactly $(n-1)$ linearly independent equations of the desired form, but a set of $n$ such independent equations does not exist.

Since we know that the set of shifts allows exactly one degree of freedom, the boundary method’s approach is to obtain $(n-1)$ well-chosen $a_{ij}$ values, fix one $a_{i}$ , and use linearly independent equations of the form given in Equation (2.5) to solve for the remaining $(n-1)$ shifts. The crucial observation is that for the $a_{i}$ ’s, there is no need to retain information about interior of the regions.

The Wasserstein distance can also be computed without saving region interiors. Once we have determined that $R\subset A_{i}$ for some region $R$ , the (partial) Wasserstein distance corresponding to $R$ is equal to

[TABLE]

and the total Wasserstein distance $P^{*}$ is equal to the sum of all such partial distances $P_{\scriptscriptstyle{R}}$ , computed over every $A_{i}$ .

Recognizing these facts, inherent in the shift characterization, inspired both the boundary method’s name and its guiding principles, summarized below:

Do not solve for the entire transport plan;

rather, identify region boundaries.

To illustrate how this principle is implemented, we present the following example.

Example 2.1

Let $X=Y=[0,\,1]^{2}$ . Assume $\mu$ is the uniform probability density, so for all Borel sets $S\subseteq A$ , $\mu(S)=\lvert S\rvert$ , and that $\nu$ has uniform discrete probability density, so $\nu(y_{i})=1/n$ for $1\leq i\leq n$ . Take $n=5$ , with the five points where $\nu$ has nonzero density distributed as shown in Figure 1.

Let $c$ be the squared Euclidean norm, $\lVert\mathbf{y}-\mathbf{x}\rVert_{2}^{2}$ . Suppose a discretization with width $2^{-5}$ is sufficient to provide the desired accuracy and that we apply the boundary method with initial width $2^{-4}$ .

Assume $\widetilde{P}$ is the partial transport cost: the sum cost of transport over all regions $P_{R}$ so far, where $P_{R}$ is defined as in Equation (2.7). Each iteration consists of two steps. In Step (1), we discretize the remaining parts of $A$ using the given width, and we solve the discrete transport problem. In Step (2), we compute the transport cost of all boxes in the interior of each region, add those costs to $\widetilde{P}$ , and discard the computed boxes. For the discard, remove the transported mass from $\nu$ , and remove the transported boxes from $A$ (so those regions can be safely ignored during any future discretized transport computations).

Figure 1 shows the state of the boundary method during the first iteration. In Figure 1LABEL:sub@f:clrA11, we have just completed Step (1): the discrete transport map has been computed, but we have not identified interior points or added anything to the partial transport cost $\widetilde{P}$ . Figure 1LABEL:sub@f:clrA12 shows the state of the algorithm after Step (2): the interior regions have been identified (shown in gray), the partial transport cost has been computed for those regions, giving us $\widetilde{P}=0.01387$ , and those regions have been discarded.

Figure 2 shows the state of the boundary method algorithm during the second iteration. Here, the regions eliminated in Iteration 1 are shown in a darker gray, to distinguish new interiors from those previously removed. In Figure 2LABEL:sub@f:clrA21, Step (1) has just been completed. As can be seen by comparing Figure 1LABEL:sub@f:clrA12 to Figure 2LABEL:sub@f:clrA21, the boundary and interior regions are the same ones that we had at the end of the first iteration, but refining the boundary set to width $w_{2}=2^{-5}$ allows us to compute a more refined transport map. Since the regions in gray were discarded at the end of Iteration 1 Step (2), they are not part of the discrete transport solution computed during Iteration 2. Because Step (1) does not add to the identified interior regions, the partial Wasserstein distance $\widetilde{P}$ is also unchanged from Figure 1LABEL:sub@f:clrA12.

After Step (2) of the second iteration, shown in Figure 2LABEL:sub@f:clrA22, more of the interiors have been identified. The partial transport cost shows a corresponding increase: we now have $\widetilde{P}=0.02898$ . Because we have achieved our desired refinement, a width of $2^{-5}$ , we end the iterative process.

We have not computed any transport cost for the white areas remaining in Figure 2LABEL:sub@f:clrA22. Hence, $\widetilde{P}$ is strictly less than the actual transport cost $P^{*}$ . We may want to perform further computations on those white areas in order to approximate the remaining transport cost and calculate an error bound for our approximation.

2.2 The boundary method

We will now formalize the process described in Example 2.1. As described below, the boundary method generates a grid $A^{r}$ over the unevaluated region of $A$ , and uses it to determine the subgrid $B^{r}$ containing the boundary set $B$ . This subgrid is determined by finding an optimal transport solution from the grid $A^{r}$ to the point set $\{\mathbf{y}_{i}\}_{i=1}^{n}$ .

Although not strictly necessary, we will restrict ourselves to $A=[0,l]^{d}$ and apply a Cartesian grid over that region. At the $r$ -th refinement level of the algorithm, the grid will thus consist of a collection of boxes with width $w_{r}$ in each dimension of our discretization. By a slight abuse of notation, we use $\mathbf{x}^{r}$ to refer to such a box, centered at the point $\mathbf{x}$ . Thus, $\mu(\mathbf{x}^{r})$ refers to the $\mu$ -measure of the box of width $w_{r}$ centered at $\mathbf{x}$ .

Neighboring boxes are those with center points that differ by no more than one unit in any discretization index. The set of neighbors of $\mathbf{x}$ is denoted $N(\mathbf{x})$ (defined in Equation (3.11), below). Because regions of $\mu$ -measure zero need not be transported to any particular $\mathbf{y}_{i}$ , boxes of positive weight that are adjacent to such regions are always retained. We refer to such a box as an edge box. Thus, the set of edge boxes is

[TABLE]

Because $A$ contains the support of $\mu$ , every box of positive mass that is adjacent to the boundary of $A$ is an edge box.

A box whose neighbors and itself all have positive measure is referred to as an internal box. The set of internal boxes is

[TABLE]

Boxes of $\mu$ -measure zero are not part of $\mathrm{edg}(A^{r})$ or $\mathrm{int}(A^{r})$ and they are discarded when the optimal transport problem is solved. We need not be concerned about losing a region $A_{i}$ due to this discard process, since this would imply $\mu(A_{i})=0$ (and hence $\nu(\mathbf{y}_{i})=0$ , which contradicts the conditions in Section 1.2).

Region interiors are identified by comparing the destination of each $\mathbf{x}\in\mathrm{int}(A^{r})$ to the destinations of its neighbors. Edge boxes are never considered part of a region interior, so they are passed directly to $B^{r}$ .

In order to remove identified region interiors, we also maintain a running total of the untransported mass, given by partial measure $\tilde{\nu}$ . To preserve the balance of the transport problem, each time a region $\mathbf{x}^{r}$ is transported from $A$ to $\mathbf{y}_{i}$ , the remaining amount that can be transported to $\mathbf{y}_{i}$ , $\tilde{\nu}(\mathbf{y}_{i})$ , must be reduced by $\mu(\mathbf{x}^{r})$ .

We can approximate the Wasserstein distance $P^{*}$ by generating a running total over region interiors: $\widetilde{P}$ . This $\widetilde{P}$ is an increasing function of $r$ , and for all $r$ , $P^{*}\geq\widetilde{P}$ . The Wasserstein distance over any remaining boundary region is evaluated at completion.

Remark 2

Further approximations may be required for a truly general algorithm. Depending on $\mu$ , it may be necessary to approximate the mass of each box, $\mu(\mathbf{x}^{r})$ . Depending on $\mu$ and $c$ , the Wasserstein distance over each box, given by $\displaystyle{\int_{\mathbf{x}^{r}}c(\mathbf{z},\,\mathbf{y}_{i})\,d\mu(\mathbf{z})}$ , may also require approximation. However, in this work we assume that the integrals can be computed exactly. In practice, this is not a significant limitation. Most numerical applications focus on the exactly-computable cases where $\mu$ is uniform and $c$ is the Euclidean or squared-Euclidean distance. Furthermore, as we show in Section 4.1, the set of exactly-computable options is quite large.

2.2.1 Step (1): solving the discrete optimal transport

problem

The proofs in Section 3 assume the discrete solver is exact, but in practice we achieve good results using solvers whose error satisfies reasonable bounds. Thus, the ideal discrete algorithm should be fast, have controlled error, and possess reasonable scaling properties. To satisfy these requirements, and to bypass the shortcomings of standard discrete approaches, we have turned to the distributed relaxation methods known as auction algorithms; see [41] and [42]. (As it turns out, there are natural connections between auction algorithms and the Oliker-Prussner algorithm for semi-discrete transport; see [43] for details).

We chose to apply a new auction algorithm, the general auction, which we developed and presented in [44]. The general auction is so named because it is based directly on the (more general) real-valued transport problem, rather than the integer-valued assignment problem which forms the foundation of other auction algorithms. As described in [44], it offers significant performance advantages over other auction algorithms. Public domain C**++** software implementing the general auction can be found on the Internet at [45].

2.2.2 Step (4): computing the shifts

Once we have reached a desired level of refinement for the boundary, we can use the set $B^{r}$ to identify $(n-1)$ shift differences $a_{ij}$ . Finding the shift differences is not necessary once we have the boundary (which is why Step (4) is optional), but the shift differences allow one to reconstruct the entire transport map.

By completing Step (4), one can reduce the transport map in $\mathbb{R}^{d}$ to a set of $n$ real numbers $a_{i}$ , greatly reducing storage requirements. Also, building the reconstructed transport map, and comparing the value of each $\mu(A_{i})$ to its corresponding $\nu(\mathbf{y}_{i})$ , effectively evaluates the actual (vs. worst case) error associated with the boundary method’s solution.

It is also worth considering that the exact shifts $\{a_{i}\}_{i=1}^{n}$ correspond to a transport map giving the exact optimal solution of our semi-discrete problem. The approximated shifts $\{\tilde{a}_{i}\}_{i=1}^{n}$ , unless generating the same shift differences, correspond to a transport map giving the exact optimal solution to a different semi-discrete problem, one whose measure $\nu$ at each $\mathbf{y}_{i}$ , $i\in\mathbb{N}_{n}$ , corresponds to the value of $\mu(\tilde{A}_{i})$ . Hence, $\lvert\mu(\tilde{A}_{i})-\nu(\mathbf{y}_{i})\rvert$ is the error in measure when approximating $A_{i}$ by $\tilde{A}_{i}$ .

2.2.3 Step (5): approximating the Wasserstein

distance

Because some applications focus on determining the transport map, rather than the Wasserstein distance, Step (5) is optional. One could also skip the computation of $\widetilde{P}$ in Step (2), since the Wasserstein distance can be computed in full using only the transport map defined by the boundary set. However, we find it convenient to compute as much of the distance as possible within the boundary method algorithm, establishing $\widetilde{P}$ one box at a time during Step (2). By the time we reach Step (5), the partial Wasserstein distance $\widetilde{P}$ includes the exact cost of all the identified interior regions, and all that remains is to determine the cost of the regions associated with $B^{r}$ .

3 Mathematical support

In this section, we provide mathematical support for the boundary method, assuming that all computations are solved exactly: both the discrete optimal transport problems handled by the general auction and the determinations of mass and Wasserstein distance for individual boxes (see Remark 2). We present three types of results: on the shift characterization, on our system of equations, and, finally, on the boundary method itself.

3.1 Semi-discrete optimal

transport and the shift characterization

Here we examine the features of the shift characterization, defined in Section 1.3, and consider what they can tell us about the semi-discrete optimal transport problem itself. While many of these results can be found in other works (e.g., [5]), detailing them fixes notation and sets the stage for the original theorems developed in the following sections.

First, in Lemmas 3.1 and 3.2, we develop theoretical support for the boundary method.

Lemma 3.1

Let $a_{i}$ and $A_{i}$ be defined as in Definition 1.4. Fix $i\in\mathbb{N}_{n}$ . If $\mathbf{x}\in A_{i}$ and $j\in\mathbb{N}_{n}$ , $j\neq i$ , then the following hold:

[TABLE]

where $g_{ij}$ is defined in Equation (2.5) and $A_{ij}$ in Equation (2.1).

Proof 1

Let us show Equation (3.1). By the definitions of $A_{i}$ and $F$ ,

[TABLE]

Rearranging terms gives

[TABLE]

To show Equation (3.2), first note that Section 2.1 already explains how $\mathbf{x}\in A_{ij}$ implies $g_{ij}(\mathbf{x})=a_{i}-a_{j}$ . Consider the converse: Assume that $g_{ij}(\mathbf{x})=a_{i}-a_{j}$ . Rewriting, we find that $a_{j}-c(\mathbf{x},\,\mathbf{y}_{j})=a_{i}-c(\mathbf{x},\,\mathbf{y}_{i})=F(\mathbf{x})$ , with $F$ defined in Equation (1.14). This implies $\mathbf{x}\in A_{j}$ , and since $\mathbf{x}\in A_{i}$ , therefore $\mathbf{x}\in A_{ij}$ . Equation (3.3) is a consequence of Equations (3.1) and (3.2).

Lemma 3.2

Let $a_{i}$ and $A_{i}$ be defined as in Definition 1.4 and $A_{ij}$ as in Equation (2.1). Assume $c$ satisfies the triangle inequality. For all $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ ,

(a)

If $c(\mathbf{y}_{i},\,\mathbf{y}_{j})=a_{i}-a_{j}$ , then $A_{j}\subseteq A_{ij}$ .

(b)

If $c(\mathbf{y}_{i},\,\mathbf{y}_{j})<a_{i}-a_{j}$ , then $A_{j}=\varnothing$ .

Proof 2

For Part (a), because $c$ satisfies the triangle inequality, for all $\mathbf{x}\in A$ ,

[TABLE]

Suppose $\mathbf{x}\in A_{j}$ . Then $a_{i}-c(\mathbf{x},\,\mathbf{y}_{i})\geq a_{j}-c(\mathbf{x},\,\mathbf{y}_{j})=F(\mathbf{x})$ , by Equation (1.14). Because $F$ is defined as the maximum such difference, this implies $a_{i}-c(\mathbf{x},\,\mathbf{y}_{i})=F(\mathbf{x})$ , and so $\mathbf{x}\in A_{i}$ . Further, since $\mathbf{x}$ is an element of $A_{i}$ and $A_{j}$ , $\mathbf{x}\in A_{ij}$ . Therefore, $A_{j}\subseteq A_{ij}$ .

To show (b), note that (3.4) now gives $a_{j}-c(\mathbf{x},\,\mathbf{y}_{j})<a_{i}-c(\mathbf{x},\,\mathbf{y}_{i})$ . Hence, for all $\mathbf{x}\in A$ , $F(\mathbf{x})\geq a_{i}-c(\mathbf{x},\,\mathbf{y}_{i})>a_{j}-c(\mathbf{x},\,\mathbf{y}_{j})$ . Therefore, $A_{j}=\varnothing$ .

Lemma 3.3

Let $F(\mathbf{x})$ be defined by Equation (1.14). If the ground cost function $c(\mathbf{x},\,\mathbf{y})$ is continuous on $X\times Y$ , then $F(\mathbf{x})$ is a continuous function of $\mathbf{x}$ .

Proof 3

Assume $c$ is defined as a continuous function in $X\times Y$ . Thus, for all $i\in\mathbb{N}_{n}$ , $a_{i}-c\mathbf{(}{x},\,{y}_{i})$ is a continuous function of $\mathbf{x}$ . Since $F$ is the maximum of a finite set of continuous functions, $F$ is itself a continuous function of $\mathbf{x}$ .

Definition 3.4 ( $F$ induces a $\mu$ -partition of $A$ )

Let $F$ be as defined in Equation (1.14), and the sets $A_{i}$ as defined in Equation (1.15) for $i\in\mathbb{N}_{n}$ . Then one says $F$ induces a $\mu$ -partition of the set $A$ if

$\mu(A)<\infty$ , 2. 2.

for all $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ , $\mu(A_{ij})=0$ (for $A_{ij}$ as defined in Equation (2.1)), 3. 3.

$\sum_{i=1}^{n}\mu(A_{i})=\mu(A)$ , and 4. 4.

for all $i\in\mathbb{N}_{n}$ , $\mu(A_{i})=\nu(\mathbf{y}_{i})>0$ .

Lemma 3.5

Suppose one has a semi-discrete transport problem, as described in Section 1.2. Let $F$ be as defined in Equation (1.14), the sets $A_{i}$ as defined in Equation (1.15) for $i\in\mathbb{N}_{n}$ , and $B$ as defined in Equation (2.2). Then $F$ induces a $\mu$ -partition of $A$ if and only if $\mu(B)=0$ .

Proof 4

If $F$ induces a $\mu$ -partition of $A$ , by Definition 3.4, $\mu(B)=0$ . For the converse, assume $F$ and the sets $A_{i}$ are defined as given, and let $A_{ij}$ be defined by Equation (2.1). Because $\mu$ is a probability density function, $\mu(A)=1<\infty$ . Because $\mu$ is a non-negative measure, $\mu(B)=0$ implies that, for all $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ , $\mu(A_{ij})=0$ .

For any $\mu$ -measurable set $S\subseteq X$ , $S=S_{1}\cup S_{2}$ ,

[TABLE]

and since $\mu(X)<\infty$ ,

[TABLE]

Proceeding inductively, it follows that

[TABLE]

Thus,

[TABLE]

For all $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ , $\mu(A_{i}\cap A_{j})=0$ , and therefore $\mu(A_{i})=\nu(\mathbf{y}_{i})$ .

Remark 3

Instances of $\mu(B)>0$ appear quite often, though (as we will show) $\mu(B)=0$ for the $p$ -norm cost functions we have assumed. For an example of $\mu(B)>0$ in the literature, see Figure 37 of [46]. We include a nearly identical example as Figure 4LABEL:sub@f:badManhattan of our paper, along with a discussion of this behavior.

Given our definition of the semi-discrete problem in Section 1.2, Corollary 4 of [6] provides a sufficient condition for the existence of a Monge solution that is unique $\mu$ -a.e. For convenience, we restate their conclusion here, as the following:

Theorem 3.6

Given the definition of $g_{ij}$ in Equation (2.5), suppose that the support of $\nu$ is finite, $c$ is continuous, $\mu$ is tight, and

[TABLE]

Then there exists an optimal transport map, $T:X\to Y$ , that solves the Monge problem, and $T$ is $\mu$ -a.e. unique.

This condition leads directly to the following theorem.

Theorem 3.7

A semi-discrete transport problem, as described in Section 1.2, has an associated transport map $T$ , a function $F$ , as described in Equation (1.14), and sets $\{A_{i}\}_{i=1}^{n}$ , as described in Equation (1.15), such that for all $\mathbf{x}\in A$ ,

[TABLE]

where $\mathring{A}_{i}$ is the strict interior of $A_{i}$ , as defined in Equation (2.3). In other words, $F$ induces a $\mu$ -partition of $A$ and $T$ agrees with $F$ on $A\setminus B$ . Furthermore, $T$ is unique $\mu$ -a.e.

Proof 5

Let $A_{ij}$ be defined as given in Equation (2.1), $B$ as given in Equation (2.2), and $g_{ij}$ as in Equation (2.5). Consider the requirements given in Section 1.2. Condition (2) ensures that $\nu$ is finite, and Condition (3) implies $c$ is continuous. We know that $A\subseteq\mathbb{R}^{d}$ , so $A$ is a Polish space, and Condition (1)(1)(b) assures us that $A$ is compact. Because every probability measure on a compact Polish space is tight777see e.g. Theorem 3.2 of [47, p. 29], $\mu$ must be tight. Because Condition (3) requires that the ground cost is equal to a $p$ -norm with $p\in(1,\,\infty)$ ,

[TABLE]

By Condition (1)(1)(a), $\mu$ is absolutely continuous, and so

[TABLE]

as required by Equation (3.5) (see [48] for another argument). Therefore, the conditions of Theorem 3.6 are satisfied.

Let the function $F$ and sets $\{A_{i}\}_{i=1}^{n}$ be as described in Definition 1.4. For any $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ , $A_{ij}\subseteq\{\mathbf{x}\in A\mid g_{ij}(\mathbf{x})=k\}$ for some fixed $k\in\mathbb{R}$ . Hence, it follows that $\mu(B)=0$ , and thus, by Lemma 3.5, $F$ induces a $\mu$ -partition of $A$ . Therefore, we can construct a transport plan $T$ that satisfies the semi-discrete problem and agrees with $F$ on $A\setminus B$ . Furthermore, by Theorem 3.6, $T$ is unique $\mu$ -a.e.

Remark 4

A close reading of the text of Theorem 3.7 reveals that the guarantee of $\mu(B)=0$ derives directly from the fact that $\lvert B\rvert=0$ ; see Equation (3.7). Absolute convergence does the rest. In practice, this means that the boundary method forces a unique transport map on all of $A$ , even regions where $\mu$ vanishes and any other map would achieve the same Wasserstein measure. For an example of this, see Figure 6. This behavior stems from a natural (unstated) corollary to Theorem 3.7: the boundaries identified by our method are a.e. unique with respect to the Lebesgue measure. The convexity of $A$ is required to guarantee the existence of the requisite boundaries. Otherwise, the network of regions might not form a connected graph.

3.2 Existence of linearly independent boundary

equations

To prove the existence of $(n-1)$ linearly independent equations of the form shown in Equation (2.6), we will investigate the structure of the boundary set using a connected graph.888For a different approach, where the cost is the squared-Euclidean distance, see [10].

Definition 3.8

Assume the definition of $A_{ij}$ given in Equation (2.1). Let $G$ be a graph with $n$ vertices $v_{1},\,\ldots,\,v_{n}$ . The edge $(v_{i},\,v_{j})$ is contained in the edge set of $G$ if and only if $A_{ij}$ is non-empty. We refer to $G$ as the adjacency graph of our transport problem.

Lemma 3.9

Let $G$ be defined as given in Definition 3.8. If the set $A$ is convex and compact, then $G$ is a connected graph.

Proof 6

Assume to the contrary that $G$ is not a connected graph. Then we can write $G$ as the union of two disjoint nonempty subgraphs, $G=G_{1}\cup G_{2}$ , such that no vertex $v_{1}$ in $G_{1}$ has a path connecting it to any vertex $v_{2}$ in $G_{2}$ .

Construct

[TABLE]

where each subset is defined as in Equation (1.15). Since $G_{1}\neq\varnothing$ and $G_{2}\neq\varnothing$ , $\tilde{A}_{1}\neq\varnothing$ and $\tilde{A}_{2}\neq\varnothing$ . Because $G_{1}$ and $G_{2}$ are disjoint, and no paths connect them, it follows that $\tilde{A}_{1}\cap\tilde{A}_{2}=\varnothing$ . Since the union of $G_{1}$ and $G_{2}$ is $G$ , $\tilde{A}_{1}\cup\tilde{A}_{2}=A$ .

Suppose $A_{i}\subseteq\tilde{A}_{1}$ , $A_{j}\subseteq\tilde{A}_{2}$ . Then $A_{ij}=\varnothing$ . Because $A$ is a compact set, $A$ is a closed and bounded, and hence the definition given in Equation (1.15) implies that $A_{i}$ and $A_{j}$ must each also be closed and bounded. Thus, $A_{i}$ and $A_{j}$ are disjoint compact sets in the Hausdorff space $\mathbb{R}^{d}$ . This implies $A_{i}$ and $A_{j}$ are separated by some positive distance $\epsilon_{ij}$ . Because this is true for all $A_{i}\subseteq\tilde{A}_{1}$ and $A_{j}\subseteq\tilde{A}_{2}$ , there exists $\epsilon>0$ , the minimum over all such $\epsilon_{ij}$ .

Let $\mathbf{x}_{1}\in\tilde{A}_{1}$ , $\mathbf{x}_{2}\in\tilde{A}_{2}$ , and for all $t\in[0,\,1]$ , define

[TABLE]

Because $\epsilon>0$ , there exists $(t_{0},\,t_{1})\subseteq[0,\,1]$ , $\lvert t_{1}-t_{0}\rvert\geq\epsilon$ , such that $t\in(t_{0},\,t_{1})$ implies $\mathbf{x}_{t}\notin\tilde{A}_{1}\cup\tilde{A}_{2}=A$ . This contradicts the convexity of $A$ . Hence, $G$ is connected.

Corollary 3.10

Assume $n\geq 2$ and let $A_{ij}$ be defined by Equation (2.1). If $i\in\mathbb{N}_{n}$ , there exists $j\in\mathbb{N}_{n}$ , such that $j\neq i$ and $A_{ij}\neq\varnothing$ .

Proof 7

Assume the contrary for some $i$ , and apply Definition 3.8. Since $n\geq 2$ , $G$ includes at least two vertices, and $v_{i}$ is disconnected from the rest of $G$ , which contradicts Lemma 3.9.

Corollary 3.11

Let $A_{ij}$ be defined by Equation (2.1) and $B$ by Equation (2.2). If $n\geq 2$ , then the boundary set $B$ is nonempty, and for each $\mathbf{x}\in B$ , there exist $i,\,j\in\mathbb{N}_{n}$ such that $i\neq j$ and $\mathbf{x}\in A_{ij}$ .

Proof 8

This follows from Corollary 3.10 and the definition of $B$ in Equation (2.2).

Lemma 3.12

Assume a shift characterization, as described in Definition 1.4, where $n\geq 2$ and the shifts $\{a_{i}\}_{i=1}^{n}$ are unknown. Let $G$ be the adjacency graph of the transport problem given in Definition 3.8, and let $H$ be a subgraph of $G$ that includes all $n$ vertices. Define the system of equations

[TABLE]

where each $a_{ij}$ is given by some constant. The system of equations $S$ is linearly independent with respect to the shifts $\{a_{i}\}_{i=1}^{n}$ if and only if $H$ contains no cycles.

Proof 9

$(\Longrightarrow)$ * Suppose $H$ contains the cycle $(v_{i_{1}},\,v_{i_{2}},\,\ldots,\,v_{i_{k}},\,v_{i_{1}})$ . Then $S$ contains the linear system*

[TABLE]

Because $\det(M)=0$ , we know $S$ is linearly dependent.

$(\Longleftarrow)$ * Suppose instead that $S$ is linearly dependent. Given the form of the equations in $S$ , we can assume without loss of generality that $S$ contains the equations $a_{i_{j}i_{j+1}}=a_{i_{j}}-a_{i_{j+1}}$ , $\forall j\in\mathbb{N}_{k-1}$ , and that $a_{i_{1}i_{k}}=a_{i_{1}}-a_{i_{k}}$ is also in $S$ . By the definition of $S$ , these equations imply that the edges $(v_{1},\,v_{2}),\,(v_{2},\,v_{3}),\,\ldots,\,(v_{k-1},\,v_{k})$ , and $(v_{k},\,v_{1})$ are contained in $H$ . Together, these edges generate the cycle $(v_{i_{1}},\,v_{i_{2}},\,\ldots,\,v_{i_{k}},\,v_{i_{1}})$ , so $H$ contains at least one cycle.*

Theorem 3.13

Assume a shift characterized problem, as described in Definition 1.4, where $n\geq 2$ and the shifts $\{a_{i}\}_{i=1}^{n}$ are unknown. Then there exists at least one system of exactly $(n-1)$ equations of the form $a_{i}-a_{j}=a_{ij}$ that is linearly independent with respect to the set of shifts $\{a_{i}\}_{i=1}^{n}$ , with each $a_{ij}$ constant. No system of $n$ independent equations exists.

Proof 10

Let $G$ be as given in Definition 3.8. Because $G$ is a connected graph, we can always create a spanning tree $H$ that is a subgraph of $G$ . Let $S$ be the corresponding set of linear equations, defined as described in (3.8). As a spanning tree, $H$ contains $(n-1)$ edges and $H$ has no cycles, so by Lemma 3.12, we know $S$ contains exactly $(n-1)$ linearly independent equations.

Suppose a set $S$ of $n$ linearly independent equations exists, all of the form $a_{i}-a_{j}=a_{ij}$ . Because there are $n$ unknowns in the set of shifts, there is exactly one solution set $\{a_{i}\}_{i=1}^{n}$ . Fix $\sigma\neq 0$ and for all $i\in\mathbb{N}_{n}$ , define $\tilde{a}_{i}=a_{i}+\sigma$ . For each equation in $S$ , $\tilde{a}_{i}-\tilde{a}_{j}=a_{i}-a_{j}=a_{ij}$ . Thus, $\{\tilde{a}_{i}\}_{i=1}^{n}$ is also a solution to $S$ . This contradicts the uniqueness of $\{a_{i}\}_{i=1}^{n}$ , and therefore no such set of $n$ linearly independent equations exists.

3.3 Discretization for the boundary

method

In the first two subsections below, we give some results on how the grid-points interact with the underlying space. In sections 3.3.3 and 3.3.4 we present error bounds. In section 3.3.5 we consider issues of volume and containment: here we ensure that one can have $B\subseteq\bar{B}^{r}$ for all $r$ , and show that $\lvert\bar{B}^{r}\rvert\to 0$ as $r\to\infty$ . Finally, Section 3.3.6 puts bounds on the error for the Wasserstein distance approximation.

3.3.1 Discretization definitions

As described in Section 2.2, we discretize the region $A$ using a regular Cartesian grid, and refine the grid over multiple iterations, with the aim of refining only the grid region containing the boundary set.

Definition 3.14

Let $\mathcal{V}$ be the set of adjacency vectors for all discretizations of $A$ . We choose $\mathcal{V}$ to be the linear combinations of the standard unit vectors, $e_{1},\,\ldots,\,e_{d}$ , with coefficients $\pm 1$ . We specifically exclude the zero vector from the set, so $\lvert\mathcal{V}\rvert=3^{d}-1$ . If $d=2$ , $\mathcal{V}$ equals

[TABLE]

Let $r\in\mathbb{N}$ be the current discretization level, and $w=w_{r}$ be the width of the discretization at level $r$ . Let $A^{r}$ be the $r$ -th point set, the set of points $\mathbf{x}$ included in the $r$ -th discretization of $A$ . Since we discard boxes of $\mu$ -measure zero during the transport step, assume without loss of generality that $\mu(\mathbf{x}^{r})>0$ for all $\mathbf{x}\in A^{r}$ .

For each iteration $r$ , let

[TABLE]

for all $i\in\mathbb{N}_{n}$ . For all $\mathbf{x}\in A^{r}$ , the points in $A^{r}$ that are adjacent to $\mathbf{x}$ constitute a subset of the neighbors of $\mathbf{x}$ ,

[TABLE]

Lemma 3.15

Let $A^{r}$ be the set of points included in the $r$ -th discretization of $A$ , and assume the definition of $N$ given in Equation (3.11). For all $\mathbf{x},\,\mathbf{x}_{0}\in A^{r}$ , if $\mathbf{x}\in N(\mathbf{x}_{0})$ , then $\mathbf{x}_{0}\in N(\mathbf{x})$ .

Proof 11

This follows from Equation (3.11) and the adjacency vectors established in Definition 3.14: for all $k\in\mathbb{N}_{d}$ , $e_{k}\in\mathcal{V}\iff-e_{k}\in\mathcal{V}$ .

We now formalize our idea of the $r$ -th interior and boundary point sets used in our discretization. For all $i\in\mathbb{N}_{n}$ , define the $r$ -th iteration interior point set associated with $A_{i}$ as

[TABLE]

Define the $r$ -th boundary point set as

[TABLE]

and let

[TABLE]

for all $i\in\mathbb{N}_{n}$ . The $r$ -th evaluation region, the subset of $A$ enclosed by the discretization $A^{r}$ , is defined as

[TABLE]

and the $r$ -th boundary region, the subset of $A$ enclosed by the boundary point set $B^{r}$ , is given by

[TABLE]

3.3.2 Distance bounds

Though the discretization is fully defined, it still needs to be related back to the sets $A_{ij}$ and the boundary set $B$ . To do this, we first bound the distance separating $B^{r}$ and $A_{ij}$ .

Lemma 3.16

Let $A^{r}$ be the set of points included in the $r$ -th discretization of $A$ , $w_{r}$ the width at that discretization, and $\mathcal{V}$ the adjacency vector set satisfying Definition 3.14. Assume $A_{ij}$ is defined by Equation (2.1), $\mathrm{edg}(\cdot)$ by Equation (2.8), and $B^{r}_{i}$ by Equation (3.14). Suppose $A$ is convex, $c$ is a $p$ -norm on $X\times Y$ , and $B^{r}_{i}\neq\varnothing$ . For each $\mathbf{x}_{i}\in B^{r}_{i}$ , either $\mathbf{x}_{i}\in\mathrm{edg}(A^{r})$ or there exists a point $\mathbf{x}_{j}=\mathbf{x}_{i}+w_{r}\mathbf{v}$ , with $\mathbf{v}\in\mathcal{V}$ , such that $\mathbf{x}_{j}\in B^{r}_{j}$ for some $j\neq i$ . Thus, if $\mathbf{x}_{i}\notin\mathrm{edg}(A^{r})$ , the distance from $\mathbf{x}_{i}$ to the set $A_{ij}$ , as measured with respect to the ground cost $c$ , is bounded above by $c(\mathbf{x}_{i},\,\mathbf{x}_{j})$ .

Proof 12

Recall the definition of $A_{i}^{r}$ in Equation (3.10). Assume $\mathbf{x}_{i}\in B^{r}_{i}\setminus\mathrm{edg}(A^{r})$ . By the definition of $B^{r}$ given in Equation (3.13), there exists $\mathbf{x}_{j}=\mathbf{x}_{i}+w_{r}\mathbf{v}\in A_{j}^{r}\cup N(\mathbf{x}_{0})$ for some $j\neq i$ , where $N(\mathbf{x}_{0})$ is the set of neighbors of $\mathbf{x}_{0}$ as defined in Equation (3.11). By Lemma 3.15, $\mathbf{x}_{i}\in N(\mathbf{x}_{j})$ , and since $\mathbf{x}_{i}\in A_{i}^{r}$ , we have $\mathbf{x}_{j}\in B^{r}_{j}$ . Thus, $\mathbf{x}_{i}\in A$ and $\mathbf{x}_{j}\in A$ , and because $A$ is convex, this implies

[TABLE]

Because $c$ is continuous on $X\times Y$ , Lemma 3.3 applies. Hence, $F$ is continuous on $A$ . Therefore, because $\mathbf{x}_{i}\in A_{i}$ and $\mathbf{x}_{j}\in A_{j}$ , there exists $t_{*}\in[0,\,1]$ such that $\mathbf{b}=t_{*}\mathbf{x}_{i}+(1-t_{*})\mathbf{x}_{j}\in A_{ij}$ . Then $\mathbf{b}=\mathbf{x}_{i}+(1-t_{*})w_{r}\mathbf{v}$ , so by applying the ground cost we have

[TABLE]

Therefore, $c(\mathbf{x}_{i},\,\mathbf{b})\leq c(\mathbf{x}_{i},\,\mathbf{x}_{j})$ .

Because we can bound the ground cost between the points in $B^{r}\setminus\mathrm{edg}(A^{r})$ and the set $A_{ij}$ in terms of the ground cost between neighboring points, it is worth identifying a bound on that ground cost between neighbors.

Lemma 3.17

Suppose $c=\lVert\cdot\rVert_{p}$ , $p\in[1,\,\infty]$ and assume a shift characterized problem in $\mathbb{R}^{d}$ . Let $N$ be defined as given by Equation (3.11) and $B^{r}_{i}$ as given by Equation (3.14). For the $r$ -th iteration of the boundary method, given width $w_{r}$ , there exists a maximum $M_{r}$ such that, for all $\mathbf{x}_{i}\in B^{r}_{i}$ and $\mathbf{x}_{j}\in B^{r}_{j}$ , where $i,\,j\in\mathbb{N}_{n}$ and $i\neq j$ , if $\mathbf{x}_{j}\in N(\mathbf{x}_{i})$ , then $c(\mathbf{x}_{i},\,\mathbf{x}_{j})\leq M_{r}\leq w_{r}d^{1/p}$ .

Proof 13

Let $\mathbf{x}_{i}$ and $\mathbf{x}_{j}$ be defined as above. By applying the definition given in Equation (3.13), $\mathbf{x}_{j}=\mathbf{x}_{i}+w_{r}\mathbf{v}$ for some $\mathbf{v}\in\mathcal{V}$ . For our Cartesian grid $\mathcal{V}$ , $\lVert\mathbf{v}\rVert_{p}$ achieves its maximum when $\mathbf{v}=\mathbbm{1}_{d}=(1,\,\ldots,\,1)\in\mathbb{R}^{d}$ , so

[TABLE]

Therefore, there exists maximum $M_{r}$ such that, for all $\mathbf{x}_{i}\in B^{r}_{i}$ and $\mathbf{x}_{j}\in B^{r}_{j}$ , $c(\mathbf{x}_{i},\,\mathbf{x}_{j})\leq M_{r}\leq w_{r}d^{1/p}$ .

3.3.3 Error bounds for shift

differences

In order to bound the error on the Wasserstein distance, we merely require a finite bound on the errors for the individual shift differences, $a_{ij}$ . However, accurately computing the shift differences themselves is also important, and for that reason, we also present theorems that more finely bound the error on $a_{ij}$ for important ground cost functions. Because estimates are generated using one or more computations of $g_{ij}(\mathbf{x})$ , the magnitude of these errors is dependent on the point(s) chosen.

Lemma 3.18

Let $A_{ij}$ be defined by Equation (2.1), and $a_{ij}$ by Equation (2.6). Suppose the ground cost $c$ satisfies the triangle inequality. Let $\mathbf{x}\in A$ and $i,\,j\in\mathbb{N}_{n}$ such that $i\neq j$ . The error resulting from approximating $a_{ij}$ at $\mathbf{x}$ is bounded above by $\lvert\alpha_{ij}(\mathbf{x})\rvert\leq 2c(\mathbf{x},\,\mathbf{b})$ , where

[TABLE]

and $\mathbf{b}$ is the point in $A_{ij}$ nearest to $\mathbf{x}$ with respect to the ground cost.

Proof 14

Assume $\mathbf{b}\in A_{ij}$ is the closest point in $A_{ij}$ to $\mathbf{x}$ . Then

[TABLE]

For every $\mathbf{x}\in A$ , there exists some $\alpha_{ij}(\mathbf{x})\in\mathbb{R}$ such that

[TABLE]

By rearrangement and substitution, we have

[TABLE]

Since $c$ satisfies the triangle inequality,

[TABLE]

Thus,

[TABLE]

and, by a similar line of reasoning, $\lvert c(\mathbf{b},\,\mathbf{y}_{j})-c(\mathbf{x},\,\mathbf{y}_{j})\rvert\leq c(\mathbf{x},\,\mathbf{b})$ . Therefore,

[TABLE]

In addition to bounding the error for individual points $\mathbf{x}$ , we can also establish meaningful global bounds.

Lemma 3.19

Assume a shift characterized problem in $\mathbb{R}^{d}$ , where $c$ is a $p$ -norm, $p\in[1,\,\infty]$ . Let $w_{r}$ be the width of the discretization during iteration $r$ , and let $\mathbf{x}^{r}$ indicate the box of width $w_{r}$ centered at the point $\mathbf{x}$ . Let $B$ be defined by Equation (2.2), $N$ by Equation (3.11), and $B^{r}_{i}$ by Equation (3.14). Taking $\alpha_{ij}$ as defined by Equation (3.17) and $\bar{B}^{r}$ as given by Equation (3.16), let $\alpha_{\max}$ be the maximum value of $\lvert\alpha_{ij}(\mathbf{x})\rvert$ over all $\mathbf{x}\in\bar{B}^{r}$ and $i,\,j\in\mathbb{N}_{n}$ , such that: (1) $i\neq j$ , (2) $\mathbf{x}\in\mathbf{x}_{i}^{r}$ for some $\mathbf{x}_{i}\in B^{r}_{i}$ , and (3) $B^{r}_{j}\cap N(\mathbf{x}_{i})\neq\varnothing$ . Then $\alpha_{\max}\leq 4w_{r}d^{1/p}$ and for all $\mathbf{x}\in\bar{B}^{r}$ , $\lVert\mathbf{x}-B\rVert_{p}\leq 2w_{r}d^{1/p}$ .

Proof 15

Suppose $\mathbf{x}\in\bar{B}^{r}$ . By the definition of our grid, $\mathbf{x}$ is contained in some $G=\mathrm{Conv}(S)$ , where $S$ is a finite set of neighboring grid points. For each $\mathbf{x}_{a},\,\mathbf{x}_{b}\in S$ , $\mathbf{x}_{b}\in N(\mathbf{x}_{a})$ , and hence $\mathbf{x}_{b}=\mathbf{x}_{a}+w_{r}\mathbf{v}$ for some $\mathbf{v}\in\mathcal{V}$ , the adjacency vectors described in Definition 3.14. Since $\lVert v\rVert_{p}\leq d^{1/p}$ , $c(\mathbf{x}_{a},\,\mathbf{x}_{b})\leq w_{r}d^{1/p}$ . Because $\mathbf{x}_{a}$ and $\mathbf{x}_{b}$ were arbitrarily chosen, this is true of every pair of vertices of $G$ . By the definition of $G$ , $\mathbf{x}$ can be written as a convex combination of the points in $S$ . Therefore, for any fixed $\mathbf{x}_{0}\in S$ , $c(\mathbf{x},\,\mathbf{x}_{0})\leq w_{r}d^{1/p}$ .

Recall the definition of $B^{r}$ given in Equation (3.13). Because $\mathbf{x}\in\bar{B}^{r}$ , $\mathrm{Conv}(S)\cap B^{r}$ must be nonempty. Assume without loss of generality that $\mathbf{x}_{0}=\mathbf{x}_{i}\in B^{r}_{i}$ for some $i\in\mathbb{N}_{n}$ . Because $c$ satisfies the triangle inequality, Lemma 3.18 applies. Hence, there must exist a point $\mathbf{x}_{j}\in B^{r}_{j}$ , a neighbor of $\mathbf{x}_{i}$ , with $j\neq i$ , and a point $\mathbf{b}\in A_{ij}$ such that $c(\mathbf{x}_{i},\,\mathbf{b})\leq c(\mathbf{x}_{i},\,\mathbf{x}_{j})\leq w_{r}d^{1/p}$ . Applying the triangle inequality, we find that

[TABLE]

Therefore, $\lVert\mathbf{x}-B\rVert_{p}\leq 2w_{r}d^{1/p}$ and $\alpha_{\max}\leq 4w_{r}d^{1/p}$ .

3.3.4 Error bound for ground costs

In preparation for bounding the Wasserstein distance error, we now bound the error on the ground cost $c$ with respect to individual points in $\bar{B}^{r}$ .

Lemma 3.20

Given a shift characterized transport problem in $\mathbb{R}^{d}$ , with ground cost $c=\lVert\cdot\rVert_{p}$ , $p\in[1,\,\infty]$ . Assume $w_{r}$ is the width of the discretization at the $r$ -th iteration and let $\tilde{\pi}^{*}$ be an approximated transport plan with associated transport map $\widetilde{T}$ , obtained using the boundary method with discretization $w_{r}$ . Suppose $\pi^{*}$ is an optimal transport plan with associated map $T$ , and let $\mathbf{x}$ in $A$ such that $T(\mathbf{x})=\mathbf{y}_{i}$ , but $\widetilde{T}(\mathbf{x})=\mathbf{y}_{j}$ . Then the error in the ground cost at the point $\mathbf{x}$ is equal to $\lvert g_{ij}(\mathbf{x})\rvert$ , where $g_{ij}$ is defined as given in Equation (2.5). Furthermore, there exists $\gamma_{\max}$ such that, for all such $\mathbf{x}\in A$ with $T(\mathbf{x})=\mathbf{y}_{i}$ and $\widetilde{T}(\mathbf{x})=\mathbf{y}_{j}$ for some $i\neq j$ ,

[TABLE]

Proof 16

Let $\mathbf{x}\in A$ such that $T(\mathbf{x})=\mathbf{y}_{i}$ , but $\widetilde{T}(\mathbf{x})=\mathbf{y}_{j}$ . Then the error in the ground cost at $\mathbf{x}$ equals

[TABLE]

As a consequence of Lemma 3.19:

[TABLE]

The result is independent of $\mathbf{x}$ , $i$ , and $j$ , and therefore there must exist some $\gamma_{\max}\leq\max_{\begin{subarray}{c}1\leq i<n\\ i<j\leq n\end{subarray}}\,\lvert a_{ij}\rvert+4w_{r}d^{1/p}<\infty$ .

3.3.5 Volume and containment for the boundary

region

As shown in Section 3.3.4, the ground cost error for individual points is finitely bounded over a wide range of admissible ground cost functions. By definition, the measure $\mu$ is bounded. We propose to identify the largest possible region in which the ground cost error can be non-zero, and to show that the area of that region goes to zero as $r$ goes to infinity. With this, we will show that the boundary method converges with respect to the Wasserstein distance.

In Equation (3.16), we defined a region $\bar{B}^{r}$ based on the point set $B^{r}$ . For this, we need to know that we can choose an initial width $w_{1}$ such that, for all iterations $r$ , $B\subset\bar{B}^{r}$ . Theorem 3.21 guarantees that such a width exists, and gives a sense of the relevant features driving the choice of $w_{1}$ . For details about the numerical considerations involved, see Remark 5.

Theorem 3.21

Assume $c$ is a $p$ -norm, $p\in[1,\,\infty]$ , and $A=[0,\,l]^{d}$ . There exists an initial width $w_{1}$ such that, for all $w_{r}$ such that $w_{r}\leq w_{1}$ , $\mathbf{x}\in\mathring{A}_{i}^{r}$ , as defined by Equation (3.12), implies the box of width $w_{r}$ centered at $\mathbf{x}$ , given by $\mathbf{x}^{r}$ , satisfies $\mathbf{x}^{r}\subseteq\mathring{A}_{i}$ , where $\mathring{A}_{i}$ is the strict interior of $A_{i}$ , as defined by Equation (2.3).

Proof 17

Recall the definition of $A_{i}$ given by Equation (1.15), $A_{ij}$ given by Equation (2.1), $B$ given by Equation (2.2), and $g_{ij}$ given by Equation (2.5). Let $\mathcal{B}(\mathbf{x},\,s)$ indicate the open ball of radius $s$ (with respect to the $p$ -norm $c$ ) centered at $\mathbf{x}$ and $\mathcal{C}(\mathbf{x},\,s)$ indicate the $d$ -dimensional cube with side length $s$ (with respect to the Euclidean distance) centered at $\mathbf{x}$ . Because $c$ is a $p$ -norm, for each $i\in\mathbb{N}_{n}$ , $\mathbf{y}_{i}\in\mathring{A}_{i}$ , and therefore there exists $\delta_{i}>0$ such that $\mathcal{B}(\mathbf{y}_{i},\,\delta_{i})\subseteq\mathring{A}_{i}$ . Thus, there exist $\varepsilon>0$ and $\delta\geq\varepsilon$ , such that, for any $i\in\mathbb{N}_{n}$ , $c(\mathbf{x},\,\mathbf{y}_{i})<\delta$ implies $\mathcal{C}(\mathbf{x},\,4s)\subseteq\mathring{A}_{i}$ for all $s\leq\varepsilon$ .

Let

[TABLE]

Because $S$ is a closed set minus a finite number of open sets, $S$ is closed.

If all $A_{ij}$ are hyperplanes on $S$ , the claim is self-evident for all $w_{1}\leq\varepsilon$ , so assume instead that at least one $A_{ij}$ is not a hyperplane on $S$ . Let

[TABLE]

There exists a maximum directional magnitude with respect to the Euclidean distance,

[TABLE]

where $\lvert x_{\mathbf{u}}-y^{i}_{\mathbf{u}}\rvert$ is the magnitude of the vector $\mathbf{x}-\mathbf{y}_{i}$ projected parallel to the direction of $\mathbf{u}$ . Because $c\in C^{2}(S)$ , $G_{1}$ and $G_{2}$ are well-defined. For any $\mathbf{x}\in S$ and any unit direction vector $\mathbf{u}\in\mathbb{R}^{d}$ ,

[TABLE]

and

[TABLE]

Hence, $G_{1}<\infty$ and $G_{2}<\infty$ .

Assume the Gaussian curvature of the set $A_{ij}$ at a point $\mathbf{x}\in A_{ij}$ is given by the function $K_{ij}(\mathbf{x})$ , and when $K_{ij}(\mathbf{x})\neq 0$ the radius of curvature is given by $R_{ij}(\mathbf{x})=\lvert K_{ij}(\mathbf{x})\rvert^{-1}$ . Because $K_{ij}(\mathbf{x})$ is defined as a product of first and second directional derivatives of $g_{ij}$ , and those derivatives are bounded, there exists a maximum absolute Gaussian curvature for $B$ on $S$ , given by

[TABLE]

Because at least one $A_{ij}$ is not a hyperplane, $K>0$ . Because $K<\infty$ , for any $i,\,j\in\mathbb{N}_{n}$ , $i\neq j$ and any $\mathbf{x}\in A_{ij}\cap S$ , the radius of curvature is bounded below: $R_{ij}(\mathbf{x})\geq K^{-1}>0$ .

Let $\tilde{\varepsilon}=\dfrac{2}{K\sqrt{d}}$ . Suppose $s\leq\min\{\varepsilon,\,\tilde{\varepsilon}\}$ , $\mathbf{x}_{0}\in\mathring{A}_{i}^{r}$ for some $i\in\mathbb{N}_{n}$ , and that $\mu(A_{j}\cap\mathcal{C}(\mathbf{x}_{0},\,s))>0$ for some $j\in\mathbb{N}_{n}$ , $j\neq i$ .

The set $\mathcal{C}(\mathbf{x}_{0},\,2s)$ is the cube surrounding $\mathbf{x}_{0}$ and its neighbors. Because $\mathbf{x}_{0}\in\mathring{A}_{i}^{r}$ , $A_{ij}$ cannot be a hyperplane in $\mathcal{C}(\mathbf{x}_{0},\,2s)$ , and so $R_{ij}$ is well-defined on $\mathcal{C}(\mathbf{x}_{0},\,2s)$ . If there exist $k\in\mathbb{N}_{n}$ and $\mathbf{x}\in\mathcal{C}(\mathbf{x}_{0},\,2s)$ such that $c(\mathbf{x},\,\mathbf{y}_{k})<\delta$ , then $\mathcal{C}(\mathbf{x}_{0},\,2s)\subseteq\mathcal{C}(\mathbf{x},\,4\varepsilon)\subseteq\mathring{A}_{k}$ , and since $\mathbf{x}_{0}\in\mathring{A}_{i}^{r}$ , this implies $k=i$ and $\mathcal{C}(\mathbf{x}_{0},\,2s)\subseteq\mathring{A}_{i}$ . This implies $\mathcal{C}(\mathbf{x}_{0},\,s)\cap A_{j}=\varnothing$ , which contradicts the claim that $\mu(A_{j}\cap\mathcal{C}(\mathbf{x}_{0},\,s))>0$ . Therefore, $c(\mathbf{x},\,\mathbf{y}_{k})\geq\delta$ for all $k\in\mathbb{N}_{n}$ and $\mathbf{x}\in\mathcal{C}(\mathbf{x}_{0},2s)$ . This implies $\mathcal{C}(\mathbf{x}_{0},\,2s)\subseteq S$ . Hence, the intersection of the boundary $A_{ij}$ with the cube $\mathcal{C}(\mathbf{x}_{0},\,2s)$ must have a point with minimum radius of curvature,

[TABLE]

and since $\mathbf{x}_{m}\in S$ , it must be the case that $R_{ij}(\mathbf{x}_{m})\geq\nicefrac{{1}}{{K}}$ .

Because $\mu(A_{j}\cap\mathcal{C}(\mathbf{x}_{0},\,s))>0$ , but $\mathbf{x}_{0}\notin A_{j}$ , there must exist $\mathbf{x}_{c}\in A_{ij}\cap\mathcal{C}(\mathbf{x}_{0},\,s)$ . Hence, within the cube $\mathcal{C}(\mathbf{x}_{0},\,2s)$ , there must be a $d$ -dimensional sphere (or partial sphere) of radius $R_{ij}(\mathbf{x}_{m})$ , not in $A_{i}$ , whose boundary intersects $\mathbf{x}_{c}$ (“partial” because the sphere may be cut off by one or more of the planes bounding the cube). Call this (partial) sphere $\widetilde{\mathcal{S}}$ .

Since $\mathbf{x}_{0}\in\mathring{A}_{i}^{r}$ , it must be the case that $\widetilde{\mathcal{S}}\cap\{N(\mathbf{x}_{0})\cup\{\mathbf{x}_{0}\}\}=\varnothing$ , where $N$ is the set of neighbors defined by Equation (3.11). Because $\mathbf{x}_{c}\in\mathcal{C}(\mathbf{x}_{0},\,s)$ , and the maximum distance between grid points in $\mathcal{C}(\mathbf{x}_{0},\,2s)$ is $s\sqrt{d}$ , this requires $R_{ij}(\mathbf{x}_{m})<s\sqrt{d}/2$ . Hence, there exists $\mathbf{x}_{m}\in A_{ij}\cap S$ such that

[TABLE]

This contradicts $R_{ij}(\mathbf{x}_{m})\geq\nicefrac{{1}}{{K}}$ . Thus, it must be the case that for all $j\in\mathbb{N}_{n}$ , $j\neq i$ implies $\mu(\mathcal{C}(\mathbf{x},s)\cap A_{j})=0$ , and therefore $C(\mathbf{x},\,s)\subseteq\mathring{A}_{i}$ .

Setting $w_{1}\leq\min\{\varepsilon,\,\tilde{\varepsilon}\}$ completes the proof.

Remark 5

By the boundary method’s very nature, any initial configuration must enclose the boundary in a way that allows it to be distinguished from the region interiors. This is the meaning behind the width $w_{1}$ considered in Theorem 3.21. In principle, $w_{1}$ may need to be quite small. In practice, the potential problems associated with an overly-large $w_{1}$ rarely occur, and they are obvious when they do. We did occasionally observe an issue when the initial $w_{1}$ was so large that a region $\mathbf{x}^{r}$ could contain an entire $A_{i}$ (in other words, when $w_{1}$ was significantly larger than the $\delta$ described in Theorem 3.21). In those cases, the affected region’s $a_{i}$ was such that $c(\mathbf{y}_{i},\,\mathbf{y}_{j})=a_{i}-a_{j}$ for some $j\neq i$ , and the resulting transport plan had $\mu(A_{i})=0$ . Hence, the set $\{a_{i}\}_{i=1}^{n}$ and reconstructed regions $\{A_{i}\}_{i=1}^{n}$ directly revealed when such an error had occurred.

Also, because of the nature of the iterative method, a poor choice of $w_{1}$ quickly becomes obvious in the boundary region itself. Simply put, the loss of any portion of the boundary set $B$ destabilizes the method. Losing part of $B$ creates a visible gap in the “wall” between two regions, and the gap increases in size with each successive iteration. This behavior seems to occur whenever some part of $B$ is lost, no matter what the cause. For example, in our tests we observed that discarding an edge box that intersects $B$ results in the same progressive damage to the boundary set. Not surprisingly, this also “stalls” the convergence of the Wasserstein distance in ways that are obvious during computation.

In our numerical tests, we used $w_{1}\leq\nicefrac{{1}}{{50n}}$ and obtained consistently reliable results.

Next, we show that a well-chosen initial width and grid arrangement can guarantee that, for every iteration $r$ , each point in $A^{r}\setminus B^{r}$ corresponds to a box in the interior of some region $A_{i}$ .

Theorem 3.22

Assume $c$ is a $p$ -norm, $p\in[1,\,\infty]$ , and $A=[0,\,l]^{d}$ . Suppose the first iteration width $w_{1}$ is chosen as described in Theorem 3.21. Fix $r$ , let $w_{r}\leq w_{1}$ , and let $A^{r}$ be the boundary set remaining at the $r$ -th iteration. Given the definition of $B$ from Equation (2.2), $\bar{A}^{r}$ from Equation (3.15), $\bar{B}^{r}$ from Equation (3.16), if $B\subseteq\bar{A}^{r}$ , then $B\subseteq\bar{B}^{r}$ , and hence $B\subseteq A^{r+1}$ .

Proof 18

We will show the conclusions by proving that $\mathbf{x}_{0}\notin\bar{B}^{r}$ implies $\mathbf{x}_{0}\notin B$ .

Suppose $\mathbf{x}_{0}\notin\bar{B}^{r}$ . If $\mathbf{x}_{0}\notin\bar{A}^{r}$ , then $\mathbf{x}_{0}\notin B$ , since by assumption, $B\subseteq\bar{A}^{r}$ . Thus, we assume instead that $\mathbf{x}_{0}\in\bar{A}^{r}\setminus\bar{B}^{r}$ .

Because $\mathbf{x}_{0}\in\bar{A}^{r}$ , we know $\mathbf{x}_{0}\in\mathbf{x}^{r}$ , the box of radius $w_{r}$ centered around some $\mathbf{x}\in A^{r}$ . We have $\mathbf{x}\in A_{i}$ for some $i\in\mathbb{N}_{n}$ , where $A_{i}$ is defined as given in Equation (1.15), and so by the definition of $A_{i}^{r}$ from Equation (3.10), $\mathbf{x}\in A_{i}^{r}$ . However, $\mathbf{x}_{0}\notin\bar{B}^{r}$ implies $\mathbf{x}^{r}\not\subseteq\bar{B}^{r}$ , so from the definition of $B^{r}$ given in Equation (3.13), $\mathbf{x}\notin B^{r}$ . Because, $\mathbf{x}\in A_{i}^{r}\setminus B^{r}=\mathring{A}_{i}^{r}$ (see Equation (3.12)), by Theorem 3.21, $\mathbf{x}^{r}\subseteq\mathring{A}_{i}$ . Hence, $\mathbf{x}_{0}\in\mathring{A}_{i}$ . Therefore, by Equation (2.3), $\mathbf{x}_{0}\notin B$ .

Now that we have ensured $B\subseteq\bar{B}^{r}$ , we aim to construct a region of controlled volume enclosing $\bar{B}^{r}$ : $\bar{B}^{r}\subseteq\bar{B}^{r}_{\scriptscriptstyle{+}}$ . Then we show that, as $r\to\infty$ , the volume of $\bar{B}^{r}_{\scriptscriptstyle{+}}$ in $\mathbb{R}^{d}$ goes to zero with respect to the Lebesgue measure. This will allow us to put a convenient upper bound on the volume of $\bar{B}^{r}$ in terms of the width $w_{r}$ . Because $\bar{B}^{r}_{\scriptscriptstyle{+}}$ exists solely in $A$ , and not on the product space, we can once again rely on the Euclidean distance in $\mathbb{R}^{d}$ .

Lemma 3.23

Assume a shift characterized transport problem in $\mathbb{R}^{d}$ , with $c=\lVert\cdot\rVert_{p}$ , $p\in[1,\,\infty]$ . Suppose $w_{r}$ is the width used for the $r$ -th iteration, and assume $B$ is defined as given in Equation (2.2), $\bar{B}^{r}$ as given in Equation (3.16). Let the region $\bar{B}^{r}_{\scriptscriptstyle{+}}\subseteq A$ be defined as

[TABLE]

For all $r$ , $\bar{B}^{r}\subseteq\bar{B}^{r}_{\scriptscriptstyle{+}}$ .

Proof 19

By definition, $\bar{B}^{r}\subseteq A$ . Suppose $\mathbf{x}\in\bar{B}^{r}$ . Because we are applying the Euclidean norm, Lemma 3.19 implies that $\lVert\mathbf{x}-B\rVert_{2}\leq 2w_{r}\sqrt{d}$ , and since $\mathbf{x}\in A$ , $\mathbf{x}\in\bar{B}^{r}_{\scriptscriptstyle{+}}$ .

Theorem 3.24

Assume $c$ is a $p$ -norm, $p\in[1,\,\infty]$ . Let $w_{r}$ be the width of the discretization applied during the $r$ -th iteration. Given the definition of $B$ in Equation (2.2) and $\bar{B}^{r}$ in Equation (3.16), if $\mu(B)=0$ and there exists some constant $\tilde{L}$ such that $\lvert B\rvert=\tilde{L}<\infty$ with respect to the $\mathbb{R}^{d-1}$ Lebesgue measure, then there exists some $L<\infty$ , such that $\left\lvert\bar{B}^{r}\right\rvert\leq w_{r}^{d}L$ with respect to the $\mathbb{R}^{d}$ Lebesgue measure.

Proof 20

Recall the definition of $\bar{B}^{r}_{\scriptscriptstyle{+}}$ given in Equation (3.19). We know $\int_{\bar{B}^{r}_{\scriptscriptstyle{+}}{}}\,d\mathbf{x}=\int_{A}\chi\left[\bar{B}^{r}_{\scriptscriptstyle{+}}{}\right]\!\!(\mathbf{x})\,d\mathbf{x}$ . Let $\mathcal{B}(\mathbf{x},\,\rho)$ be the closed ball of radius $\rho$ centered at $\mathbf{x}$ , and defined with respect to the Euclidean distance. Write

[TABLE]

For all fixed $\mathbf{x}$ ,

[TABLE]

where $\mathrm{Vol}_{d}(\rho)$ is the volume of the $d$ -dimensional sphere of radius $\rho$ , defined with respect to the Euclidean distance. By using the $\mathrm{\Gamma}$ function, this volume can be written as

[TABLE]

Because the volume is independent of the point $\mathbf{x}\in A$ , we therefore have

[TABLE]

where

[TABLE]

Let $\mathbf{x}\in\bar{B}^{r}$ . By applying Lemma 3.19 with $c$ the Euclidean distance, we know that for all $\mathbf{x}\in\bar{B}^{r}$ , $\lVert\mathbf{x}-B\rVert_{2}\leq 2w_{r}\sqrt{d}$ , which implies $\mathbf{x}\in\bar{B}^{r}_{\scriptscriptstyle{+}}{}$ . Thus, $\bar{B}^{r}\subseteq\bar{B}^{r}_{\scriptscriptstyle{+}}{}$ , which implies $\lvert\bar{B}^{r}\rvert\leq\lvert\bar{B}^{r}_{\scriptscriptstyle{+}}{}\rvert\leq w_{r}^{d}L$ .

Remark 6

The interplay between $B$ , $B^{r}$ , $\bar{B}^{r}$ , and $\bar{B}^{r}_{\scriptscriptstyle{+}}$ is nontrivial. Figure 3 helps to visualize it properly. In Figure 3LABEL:sub@f:Br, we show placement of some boundary set $B^{r}$ . It is crucial that the subgrid created by $B^{r}$ completely surrounds $B$ , because that is the only way to ensure that $B\subseteq\bar{B}^{r}$ . One can see in this image how a (very degenerate) choice of $c$ , coupled with the right arrangement of $\mathbf{y}_{i}$ ’s, might allow a small and sharply curved boundary set to slip unnoticed between points.

As Figure 3LABEL:sub@f:barBr illustrates, each point in $B^{r}$ appears as the center of its corresponding box, and the boxes completely cover the boundary set.

The region $\bar{B}^{r}_{\scriptscriptstyle{+}}$ is deliberately constructed to entirely cover all the boxes in $\bar{B}^{r}$ . As Figure 3LABEL:sub@f:barBrPLUS shows, its volume can be significantly larger than that of the boxes it contains. However, the worst-case “thickness” given to $\bar{B}^{r}_{\scriptscriptstyle{+}}$ ensures that it will always enclose both $B$ and $\bar{B}^{r}$ .

3.3.6 The Wasserstein distance error

Theorem 3.25

Assume $\mu$ is absolutely continuous and let $P^{*}$ be the Wasserstein distance. Let $w_{r}$ be the width of the $r$ -th iteration of the boundary method. Given the definition of $B$ in Equation (2.2) and $\bar{B}^{r}$ in Equation (3.16), suppose $B\subseteq\bar{B}^{r}$ , and that there exists some $L$ such that $\lvert\bar{B}^{r}\rvert=w_{r}^{d}L<\infty$ with respect to the $d$ -dimensional Lebesgue measure. If $\gamma_{\max}<\infty$ is the maximum error of the ground cost in the set $\bar{B}^{r}$ , and $\widetilde{P}^{*}$ is the Wasserstein distance approximation obtained with the boundary method, then the value of $\mu$ on $A$ is bounded by some $M<\infty$ and

[TABLE]

where the bound equals the maximum possible volume of $\bar{B}^{r}$ multiplied by the maximum value of $\mu$ and the maximum error of the ground cost.

Proof 21

If $\mathbf{x}\in A\setminus\bar{B}^{r}$ , then $\mathbf{x}$ has been identified as being in the interior of $A_{i}$ for some $i\in\mathbb{N}_{n}$ . Thus, the cost error associated with the points outside $\bar{B}^{r}$ is zero.

Suppose instead that $\mathbf{x}\in\bar{B}^{r}$ . By definition, the absolute value of the difference between the correct and approximated ground costs at $\mathbf{x}$ is less than or equal to $\gamma_{\max}$ . Condition (1)(1)(a) requires $\mu$ to be absolutely continuous, so there exists $M$ such that, for all $\mathbf{x}\in X$ , $0\leq\mu(\mathbf{x})\leq M<\infty$ .

Therefore, the error on the Wasserstein distance is bounded above by

[TABLE]

Remark 7

The bounds in Theorems 3.24 and 3.25 indicate that the volume of the boundary set and the error of the computed Wasserstein distance decrease according to the dimension of the space. Thus, we should expect our numerical tests to show a quadratic (in $\mathbb{R}^{2}$ ) or cubic (in $\mathbb{R}^{3}$ ) decrease of the Wasserstein distance error. These decreases are clearly observed in practice, see Section 4.

4 Numerical results

4.1 Test conditions

As mentioned in Remark 2, some choices of $\mu$ may make it necessary to approximate $\mu(\mathbf{x}^{r})$ or $\displaystyle{\int_{\mathbf{x}^{r}}c(\mathbf{z},\,\mathbf{y}_{i})\,d\mu(\mathbf{z})}$ . However, the majority of numerical studies we have seen restrict to simple choices of $\mu$ (most often uniform). For this reason, we restricted our examples to cases where the cost and mass integrals can be written in a closed form.

4.1.1 The closed-form mass $\mu(\mathbf{x}^{r})$

The integral of $\mu$ over some box can be written as:

[TABLE]

Since $\mu$ is a probability density function, we must have $\int_{A}d\mu=1$ . For convenience, let $\hat{\mu}$ denote an un-normalized version of $\mu$ , and similarly for $\hat{M}$ .

Using the linearity of the integral, one can use linear combination of simple functions for which exact solutions are known. We can also construct more complex measures by partitioning $A$ into disjoint subsets. In this case, however, we add an additional restriction in order to be sure that exact solutions can always be found: We $\mu$ -partition $A$ into subsets $S_{1},\,\ldots,\,S_{\sigma}$ , such that the boundaries of each $S_{s}$ fall on the initial set of grid lines. Assume that for each set $S_{s}$ , there exists a density function $\hat{\mu}_{s}$ that is exactly solvable on $S_{s}$ . From these, we consider $\hat{\mu}$ (and $\hat{M}$ ) to be the piecewise functions defined on each $S_{s}$ as $\hat{\mu}_{s}$ (and $\hat{M}_{s}$ , respectively).

Most of our computations were performed in two-dimensions. For such problems, given iteration $r$ and $\mathbf{x}=(x_{1},\,x_{2})\in A$ , $\hat{\mu}(\mathbf{x}^{r})$ can be written as

[TABLE]

The closed-form choices used in our numerical tests are shown in Table 1. As described above, we used the table entries as building blocks in the construction of more complex measures.

4.1.2 The closed-form Wasserstein distance over

$\mathbf{x}^{r}$

We performed many tests where $\mu$ could be computed exactly but the Wasserstein distance could not; see Section 4 for details. In such cases, we made no attempt to approximate $P^{*}$ , choosing instead to focus on the accuracy of the $\mu$ -partition generated by the approximate shift set $\{\tilde{a}_{i}\}_{i=1}^{n}$ .

However, there were a number of cases in two dimensions where the choice of $\mu$ and $c$ allowed for closed-form computations. In those cases, because the combination of $c$ and $\mu$ gives us an exact solution, there exists $C:X\times Y\to\mathbb{R}^{{}^{\geq 0}}$ such that

[TABLE]

As in Section 4.1.1, we write $\hat{C}$ when working with $\hat{\mu}$ .

Now consider $X,\,Y\subset\mathbb{R}^{2}$ , $\mathbf{x}=(x_{1},\,x_{2})\in A$ , and $\mathbf{y}=(y_{1},\,y_{2})\in\{\mathbf{y}_{i}\}_{i=1}^{n}$ . When $\mu(\mathbf{x}^{r})=0$ , the Wasserstein distance on $\mathbf{x}^{r}$ is also zero. For those boxes where $\mu(\mathbf{x}^{r})>0$ , we can take advantage of the uniformity to define the function $\hat{C}$ in terms of a single variable: the component-wise distance between points given by $(\mathrm{\Delta}_{1},\,\mathrm{\Delta}_{2})$ , where $\mathrm{\Delta}_{1}=\left\lvert x_{1}-y_{1}\right\rvert$ , $\mathrm{\Delta}_{2}=\left\lvert x_{2}-y_{2}\right\rvert$ . When the Wasserstein distance over $\mathbf{x}^{r}$ can be computed and is non-zero, it takes the form

[TABLE]

where $\hat{C}:\mathbb{R}^{2}\to\mathbb{R}^{{}^{\geq 0}}$ is an explicit function.

Table 4.1.2 gives Wasserstein distance functions $\hat{C}$ for $c$ the $2$ -norm and the $p$ -th power of some $p$ -norm ( $p\in[1,\infty)$ ). By leveraging the linearity of the integral and subdividing $A$ into disjoint sets, we can build combinations of ground costs and measures with closed form $C$ . We used this to perform tests in $\mathbb{R}^{2}$ , with $\mu$ being either uniform or zero in relevant boxes.

Bibliography48

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Monge, Mémoire sur la théorie des déblais et des remblais, in: Histoire de l’Académie Royale des Sciences de Paris, avec les Mémoires de Mathématique et de Physique pour la même année, Académie des sciences (France)., 1781, pp. 666–704, in French.
2[2] L. V. Kantorovich, On the translocation of masses, C.R. (Doklady) Acad. Sci. URSS (N.S.) 37 (1942) 199–201.
3[3] L. V. Kantorovich, On a problem of Monge, Uspekhi Mat. Nauk 3 (1948) 225–226.
4[4] C. Villani, Topics in Optimal Transportation, Vol. 58 of Graduate Studies in Mathematics, American Mathematical Society, Providence, R.I., 2003.
5[5] W. Gangbo, R. J. Mc Cann, The geometry of optimal transportation, Acta Mathematica 177 (2) (1996) 113–161.
6[6] J. A. Cuesta-Albertos, A. Tuero-Díaz, A characterization for the solution of the Monge-Kantorovich mass transference problem, Statistics and Probability Letters 16 (2) (1993) 147–152.
7[7] A. Pratelli, On the equality between Monge’s infimum and Kantorovich’s minimum in optimal mass transportation, Annales de l’Institut Henri Poincare (B): Probability and Statistics 43 (1) (2007) 1–13.
8[8] L. Rüschendorf, Monge-Kantorovich transportation problem and optimal couplings, Jahresbericht der Deutschen Mathematiker-Vereinigung 109 (3) (2007) 113–137.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The boundary method for semi-discrete optimal transport partitions and

Abstract

keywords:

MSC:

1 Introduction

1.1 Description of optimal transport: the

Definition 1.1** (Monge-Kantorovich problem)**

Definition 1.2** (Dual formulation)**

Remark 1

Definition 1.3** (Monge problem)**

1.2 Semi-discrete problem

1.2.1 Semi-discrete transport

1.3 Shift characterization for semi-discrete

Definition 1.4** (Shift characterization)**

1.4 Numerical approaches to the MK

2 Boundary Method

2.1 Boundary identity and system of

Example 2.1

2.2 The boundary method

Remark 2

2.2.1 Step (1): solving the discrete optimal transport

2.2.2 Step (4): computing the shifts

2.2.3 Step (5): approximating the Wasserstein

3 Mathematical support

3.1 Semi-discrete optimal

Lemma 3.1

Proof 1

Lemma 3.2

Proof 2

Lemma 3.3

Proof 3

Definition 3.4** (FFF induces a μ\muμ-partition of AAA)**

Lemma 3.5

Proof 4

Remark 3

Theorem 3.6

Theorem 3.7

Proof 5

Remark 4

3.2 Existence of linearly independent boundary

Definition 3.8

Lemma 3.9

Proof 6

Corollary 3.10

Proof 7

Corollary 3.11

Proof 8

Lemma 3.12

Proof 9

Theorem 3.13

Proof 10

3.3 Discretization for the boundary

3.3.1 Discretization definitions

Definition 3.14

Lemma 3.15

Proof 11

3.3.2 Distance bounds

Lemma 3.16

Proof 12

Lemma 3.17

Proof 13

3.3.3 Error bounds for shift

Lemma 3.18

Proof 14

Lemma 3.19

Proof 15

3.3.4 Error bound for ground costs

Lemma 3.20

Proof 16

3.3.5 Volume and containment for the boundary

Theorem 3.21

Proof 17

Remark 5

Theorem 3.22

Definition 1.1 (Monge-Kantorovich problem)

Definition 1.2 (Dual formulation)

Definition 1.3 (Monge problem)

Definition 1.4 (Shift characterization)

Definition 3.4 ( $F$ induces a $\mu$ -partition of $A$ )

4.1.1 The closed-form mass $\mu(\mathbf{x}^{r})$