The boundary method for semi-discrete optimal transport partitions and Wasserstein distance computation
Luca Dieci, J.D. Walsh III

TL;DR
The paper introduces the boundary method, a new technique for efficiently solving semi-discrete optimal transport problems across various cost functions, with theoretical backing and practical testing.
Contribution
It presents the boundary method that reduces problem complexity and provides convergence analysis for p-norm cost functions, extending applicability to general costs.
Findings
Effective reduction in problem dimension
Convergence proven for p-norm costs
Successful testing on various cost functions
Abstract
We introduce a new technique, which we call the boundary method, for solving semi-discrete optimal transport problems with a wide range of cost functions. The boundary method reduces the effective dimension of the problem, thus improving complexity. For cost functions equal to a p-norm with p in (1,infinity), we provide mathematical justification, convergence analysis, and algorithmic development. Our testing supports the boundary method with these p-norms, as well as other, more general cost functions.
| , | ||
| , | ||
| , |
| -norm | |
|---|---|
| -th power -norm |
| abs. error | |
|---|---|
| abs. error | |
|---|---|
| abs. error | |
|---|---|
| abs. error | |
|---|---|
| -norm | |||
|---|---|---|---|
| -norm | |||
| squared -norm |
| T (sec) | S (MB) | ||
|---|---|---|---|
| T (sec) | S (MB) | ||
|---|---|---|---|
| T (sec) | S (MB) | T (sec) | S (MB) | |
|---|---|---|---|---|
| 128 | 16.938 | 17.25 | 22.365 | 33.91 |
| 136 | 12.190 | 18.24 | 36.601 | 35.05 |
| 144 | 10.982 | 17.99 | 29.952 | 36.49 |
| 152 | 13.139 | 18.54 | 36.703 | 41.27 |
| 160 | 11.420 | 18.66 | 34.801 | 40.27 |
| 168 | 15.727 | 20.97 | 44.959 | 40.66 |
| 176 | 15.332 | 21.38 | 44.873 | 43.06 |
| 184 | 18.243 | 21.38 | 53.689 | 43.20 |
| 192 | 12.796 | 21.60 | 40.029 | 43.66 |
| Time | ||
| Storage |
| Time | |
|---|---|
| Storage |
| alone | Time | |
| Storage | ||
| alone | Time | |
| Storage | ||
| and | Time | |
| Storage |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The boundary method for semi-discrete optimal transport partitions and
Wasserstein distance computation111This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1650044. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Luca Dieci
School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332 U.S.A.
Tel.: +1 404-894-9209 Fax: +1 404-894-4409
J.D. Walsh III
Naval Surface Warfare Center, Panama City Division (X24), 110 Vernon Ave., Panama City, FL 32407 U.S.A.
Tel.: +1 850-234-4660 Fax: +1 850-235-5374
Abstract
We introduce a new technique, which we call the boundary method, for solving semi-discrete optimal transport problems with a wide range of cost functions. The boundary method reduces the effective dimension of the problem, thus improving complexity. For cost functions equal to a -norm with , we provide mathematical justification, convergence analysis, and algorithmic development. Our testing supports the boundary method with these -norms, as well as other, more general cost functions.
keywords:
Optimal transport , Monge-Kantorovich , semi-discrete , Wasserstein distance , boundary method
MSC:
65K10 , 35J96 , 49M25
label=(), ref=(0)-()-()
1 Introduction
In this work, we consider a new solution method for optimal transport problems. Numerical optimal transport has applications in a wide range of fields, but the scaling properties and ground cost restrictions of current numerical methods make it difficult to find solutions for many applications.
The boundary method we propose focuses on a broad class of optimal transportation problems: semi-discrete optimal transport. Many other techniques assume semi-discrete transport, either implicitly or explicitly, as semi-discrete formulations can be used to approximate solutions to fully continuous problems, and the semi-discrete optimal transport problem is of practical relevance itself.
Key challenges in numerical optimal transport are: (a) the design of numerical methods capable of handling general ground costs, (b) efficient computation of the Wasserstein metric, and (c) solutions of three (or higher) dimensional problems. The boundary method addresses these concerns by solving problems where the ground cost is a -norm, , and by doing so in a way that reduces the effective dimension of the transport problem.
1.1 Description of optimal transport: the
Monge-Kantorovich problem
The theory of optimal transport dates back to the work by Monge in 1781, [1]. In the 1940s, Kantorovich’s papers [2, 3] relaxed Monge’s requirement that no mass be split, creating we now know as the Monge-Kantorovich problem.
Definition 1.1** (Monge-Kantorovich problem)**
Let , let and be probability densities defined on and , and let be a measurable ground cost function. Define the set of transport plans
[TABLE]
where is the set of probability measures on the product space, and define the primal cost function as
[TABLE]
The Monge-Kantorovich problem is to find the optimal primal cost
[TABLE]
and an associated optimal transport plan
[TABLE]
Kantorovich also identified the problem’s dual formulation.
Definition 1.2** (Dual formulation)**
Define the set of functions
[TABLE]
Let the dual cost function, , be defined as
[TABLE]
Then, the optimal dual cost is
[TABLE]
and an optimal dual pair is given by
[TABLE]
When the ground cost is a distance function (often but not necessarily Euclidean), Monge-Kantorovich solutions are related to the Wasserstein metric, a distance between probability distributions:
[TABLE]
We have , and hence, we may refer to any of these as the Wasserstein distance, the optimal transport cost, or simply the optimal cost.222See also [4, p. 207], a definition of the Wasserstein metric with .
Remark 1
* is often written as , with and implied. Furthermore, as Equation (1.9) makes clear, also depends on the ground cost function . In the literature, the Wasserstein distance formula often assumes the ground cost to be a specific predetermined function, usually the Euclidean distance .*
Definition 1.3** (Monge problem)**
In certain cases, there exists at least one solution to the semi-discrete Monge-Kantorovich problem that does not split transported masses. In other words, there exists some such that
[TABLE]
where is a measurable map called optimal transport map.333One can also write as . Our notation is from [4, p. 3]. The alternative notation is used in [5]. When such a exists, we say the solution also solves the Monge problem.
If the Monge-Kantorovich problem has a solution which solves the Monge problem, we can assume without loss of generality that every satisfies
[TABLE]
for some measurable transport map , and that the primal cost can be written
[TABLE]
1.2 Semi-discrete problem
The semi-discrete optimal transport problem we consider is the Monge-Kantorovich problem of Definition 1.1, with restrictions on and , and .
(1)
Assume that satisfies the following:
(a)
is absolutely continuous with respect to the Lebesgue measure.
(b)
The support of is contained in the convex compact region .
(Since , it must also be the case that is simply connected.)
(2)
Assume has exactly non-zero values, located at .
(3)
Assume is a -norm with .
As we will show, each of these conditions is required for one or more of the theorems given in Section 3. Condition (1)(1)(a) ensures that the value of is bounded, which is required to show Wasserstein distance convergence in Theorem 3.25. Conditions (1)(1)(a), (1)(1)(b), (2), and (3) are all used to satisfy the conditions of Corollary 4444See Theorem 3.6, below, for a full statement of this result. of [6], which we apply to show the -a.e. uniqueness of the solution in Theorem 3.7.
1.2.1 Semi-discrete transport
and the Monge problem
Since is absolutely continuous, implies for all Borel sets in . Hence, is nonatomic. Because is continuous and is nonatomic, at least one solution to the semi-discrete Monge-Kantorovich problem also satisfies the Monge problem, described in Definition 1.3; see Theorem B in [7]. Thus, by applying Equation (1.11), we can assume without loss of generality that any transport plan partitions into sets , where is the set of points in that are transported by the map to . Using this partitioning scheme in combination with Equation (1.12) allows us to rewrite the primal cost function for the semi-discrete problem as
[TABLE]
1.3 Shift characterization for semi-discrete
optimal transport
Using this idea of sets , we are ready to describe the shift characterization of the semi-discrete optimal transport problem. The definition of the characterization, which follows, is based on one given by Rüschendorf and Uckelmann in [8, 9].
Definition 1.4** (Shift characterization)**
Let be a set of finite values, referred to as shifts. Define
[TABLE]
For , where , let
[TABLE]
Note that . The problem of determining an optimal transport plan is equivalent to determining shifts such that for all , the total mass transported from to equals .
The shift characterization is derived from the dual cost function given in Equation (1.6). For any , suppose we define
[TABLE]
Then for all .
For the semidiscrete problem, is exactly Equation (1.14), and the shifts correspond to the value of at each Dirac mass . Hence, the discrete problem is no more than a special case of the general continuous problem where is a continuous density function and an empirical measure. For a detailed derivation, see [5].
In the same way, the sets correspond to the subdifferentials . For a general cost function , the sets are referred to in analysis as Laguerre cells, and the map generated by the sets over is called a Laguerre diagram. As we will discuss further on, the boundaries between Laguerre cells are typically sections of hypersurfaces. When , the boundaries are sections of hyperplanes, and the map as called a power diagram. See [10] for a detailed evaluation of this special case. There are also cost functions where, for certain arrangements of , the boundaries between Laguerre cells have positive Lebesgue measure in . An example is shown in Figure 4LABEL:sub@f:badManhattan.
1.4 Numerical approaches to the MK
problem
Applications of optimal transport are found in many areas of research, including medicine, economics, image processing, machine learning, physics, and many others; e.g., see [11, 12, 13, 14, 15]. For that reason, many people have focused their research on numerical methods for the Monge-Kantorovich problem.
The solution to a semi-discrete problem can be approximated by treating the problem as fully discrete, and the solution to a fully continuous problem can be approximated by treating it as either semi- or fully discrete. By “treating,” we refer primarily to assumptions about continuity: in practice, nearly every approach fully discretizes the problem, and the complexity of such approaches is relative to the measure of the discretization.
The semi-discrete problem has received significant attention in its role as a discretization of the continuous problem (where continuity assumptions are employed over but not ). Substantial effort has been taken to quantify the extent to which solutions to such semi-discrete problems approximate the solution to the original continuous problem; for example, see [16]. However, the semi-discrete problem has interesting applications in its own right. Recent developments include works in economics [17, 18, 19], image processing [20], and optics [21, 22]. In addition, the power and flexibility of Laguerre cell tesselation (vs. Voronoi) drive ongoing research in physics and other fields.
When the ground cost for the semi-discrete problem is the squared -norm, , significant numerical progress has been achieved. In 1988, Oliker and Prussner introduced what came to be called the Oliker-Prussner algorithm for nonlinear Monge-Ampère-type equations in ; see [23]. Oliker and Prussner were significantly ahead of their time. A 1992 paper by Aurenhammer et al., [24], while describing a different algorithm (Newton’s method), explicitly connected the Oliker and Prussner’s approach to semi-discrete transport and its resulting “Voronoi-type diagrams.” In 1998 Aurenhammer et al. published [25], a revision that clarified important details, and incorporated an argument from [6] to guarantee that the sets partition -a.e. More recent algorithms appear in [26, 16].
When sets and share a boundary, for some , there is a monotone relationship between the volume of and the difference of shifts, . The Oliker-Prussner approach and the boundary method both exploit this relationship, though in very different ways. Whether applying the Oliker-Prussner algorithm or some variation such as Newton’s method, the Oliker-Prussner approach begins with approximated sets , and directly perturbs the approximated shift difference in order to bring closer to . This approach is extended over all the shift differences,555They refer to a set of shift differences as a weight vector. making it, in essence, a method for solving the Monge-Kantorovich dual problem with . Because the squared -norm is strictly convex, and it ensures that the boundary for each adjacent and is a hyperplane, algorithms based on the Oliker-Prussner approach are generally able to quantify convergence behavior and guarantee termination after a finite number of refinement steps.
Numerous efforts have been made to extend the approach proposed by Oliker and Prussner. An application-focused paper by Caffarelli et al. extends the Oliker-Prussner algorithm to , assuming special geometries [27]. Lévy presents a parallelized Newton’s method for three dimensions, one which scales well when consists of large numbers of Dirac masses [28]. Other works, such as [29], attempt to integrate the Oliker-Prussner approach with the Wide Stencil methods developed for continuous Monge-Ampère problems; see, e.g., [30, 31]. All of these assume .
A few authors have attempted to develop approaches for ground costs other than the squared -norm. Most of these do not employ Oliker-Prussner. In [9], Rüschendorf and Uckelmann report on numerical experiments with ground costs given by the Euclidean distance taken to the powers , , , and . They assume that is the uniform distribution, and test various weights and placements for the set . When an exact solution cannot be directly determined, they fully discretize the problem and use a linear programming solver.
In [32], Schmitzer works with cost functions for , and applies a form of adaptive scaling done by “shielding” regions: his method attempts to determine points of influence in order to solve primarily local problems. He restricts his examples to .
Solving the semi-discrete problem for the -norm is discussed in [33].666In [33], the partition of is called an “optimal coupling.” Starting with an alternative form of Equation (1.17), taken from [34], Barrett and Prigozhin develop a mixed formulation of the Monge-Kantorovich problem, which they solve using a standard finite element discretization.
Kitagawa’s 2014 paper, [35], offers a potentially broad generalization of the Oliker-Prussner algorithm, which works for ground costs other than , provided those ground costs satisfy strict conditions, including Strong Ma-Trudinger-Wang; see also [36]. His proposals, while densely theoretical, do not include numerics or an explicit iterative scheme.
As [26] states, the special case has two methods specifically designed for solving semi-discrete problems directly: the Oliker-Prussner algorithm and the damped Newton methods proposed in papers like [25]. Both rely on some variant of what we call the Oliker-Prussner approach, described above. However, approaches developed for fully discrete or continuous transport can also be applied to the semi-discrete problems, though with varying degrees of effectiveness. Rüschendorf and Uckelmann apply a discrete linear program solver in [9], and the solver Barrett and Prigozhin use in [33] was developed for continuous transport.
Discrete methods assume a fully discrete and , and solve the resulting minimization problem using network flow minimization techniques. As described in [37], there are over 20 established methods for solving such problems, and at least seven software packages capable of handling one or more of these methods.
Most approaches to the fully continuous Monge-Kantorovich problem assume specific ground costs and solve using techniques developed for elliptic partial differential equations, particularly those of the Monge-Ampère-type:
[TABLE]
If the ground cost function is strictly convex, or otherwise satisfies the Ma-Trudinger-Wang regularity conditions described in [36], such problems are well-posed. To date, the requirements of well-posedness have largely restricted the application of such continuous methods to well-behaved cost functions such as or a regularized Euclidean distance. Continuous methods currently in use apply finite difference, gradient descent, or the iterative Bregman projections (a.k.a. Sinkhorn-Knopp) algorithm, all attempting to map to a fully discretized [38, 39, 40].
As we will show, the boundary method offers a new approach to solving semi-discrete transport, distinct from all of those described above. By and large, the solution methods described above only work for a specific fixed cost, usually . The boundary method quickly solves problems with more general ground costs. When the ground cost is a -norm, with , the boundary method provides a global rate of convergence that is proportional to the volume of .
2 Boundary Method
At a high level, the idea behind the boundary method is simple: track only the boundaries between regions, without resolving the regions’ interiors. To do this in practice and obtain an efficient technique, we must account for the interplay between discretization, a mechanism for discarding interior regions, and a fast solver.
At its heart, the boundary method can be viewed as an adaptive refinement technique, one which focuses on the shared region boundaries. The method discards interior regions, but a well-chosen initial discretization prevents any corresponding loss of accuracy. The boundary method’s strategy progressively refines the boundaries between individual regions . Thus, by the method’s very nature, any initial configuration must enclose the boundary in a way that allows it to be distinguished from the region interiors. The necessary conditions for a well-chosen initial discretization are presented in Theorem 3.21 and discussed in detail in Remark 5.
2.1 Boundary identity and system of
equations
For all such that , let
[TABLE]
The boundary set is defined as
[TABLE]
and for each , let the strict interior of be defined as
[TABLE]
For all such that , define as
[TABLE]
By Corollary 3.11 below, and for each there exist , , such that . Because , we have , and because , we have . Combining and rearranging these, we get
[TABLE]
Thus, Equation (2.5) implies that is a subset of a level set of ; the value is constant, regardless of which is chosen. Using this information, for each , , such that , we can define the constant shift difference
[TABLE]
Given a sufficiently large set of linearly independent equations of the form given in Equation (2.6), one could determine most or all of the shifts . As we show in Theorem 3.13, it is possible to obtain exactly linearly independent equations of the desired form, but a set of such independent equations does not exist.
Since we know that the set of shifts allows exactly one degree of freedom, the boundary method’s approach is to obtain well-chosen values, fix one , and use linearly independent equations of the form given in Equation (2.5) to solve for the remaining shifts. The crucial observation is that for the ’s, there is no need to retain information about interior of the regions.
The Wasserstein distance can also be computed without saving region interiors. Once we have determined that for some region , the (partial) Wasserstein distance corresponding to is equal to
[TABLE]
and the total Wasserstein distance is equal to the sum of all such partial distances , computed over every .
Recognizing these facts, inherent in the shift characterization, inspired both the boundary method’s name and its guiding principles, summarized below:
Do not solve for the entire transport plan;
rather, identify region boundaries.
To illustrate how this principle is implemented, we present the following example.
Example 2.1
Let . Assume is the uniform probability density, so for all Borel sets , , and that has uniform discrete probability density, so for . Take , with the five points where has nonzero density distributed as shown in Figure 1.
Let be the squared Euclidean norm, . Suppose a discretization with width is sufficient to provide the desired accuracy and that we apply the boundary method with initial width .
Assume is the partial transport cost: the sum cost of transport over all regions so far, where is defined as in Equation (2.7). Each iteration consists of two steps. In Step (1), we discretize the remaining parts of using the given width, and we solve the discrete transport problem. In Step (2), we compute the transport cost of all boxes in the interior of each region, add those costs to , and discard the computed boxes. For the discard, remove the transported mass from , and remove the transported boxes from (so those regions can be safely ignored during any future discretized transport computations).
Figure 1 shows the state of the boundary method during the first iteration. In Figure 1LABEL:sub@f:clrA11, we have just completed Step (1): the discrete transport map has been computed, but we have not identified interior points or added anything to the partial transport cost . Figure 1LABEL:sub@f:clrA12 shows the state of the algorithm after Step (2): the interior regions have been identified (shown in gray), the partial transport cost has been computed for those regions, giving us , and those regions have been discarded.
Figure 2 shows the state of the boundary method algorithm during the second iteration. Here, the regions eliminated in Iteration 1 are shown in a darker gray, to distinguish new interiors from those previously removed. In Figure 2LABEL:sub@f:clrA21, Step (1) has just been completed. As can be seen by comparing Figure 1LABEL:sub@f:clrA12 to Figure 2LABEL:sub@f:clrA21, the boundary and interior regions are the same ones that we had at the end of the first iteration, but refining the boundary set to width allows us to compute a more refined transport map. Since the regions in gray were discarded at the end of Iteration 1 Step (2), they are not part of the discrete transport solution computed during Iteration 2. Because Step (1) does not add to the identified interior regions, the partial Wasserstein distance is also unchanged from Figure 1LABEL:sub@f:clrA12.
After Step (2) of the second iteration, shown in Figure 2LABEL:sub@f:clrA22, more of the interiors have been identified. The partial transport cost shows a corresponding increase: we now have . Because we have achieved our desired refinement, a width of , we end the iterative process.
We have not computed any transport cost for the white areas remaining in Figure 2LABEL:sub@f:clrA22. Hence, is strictly less than the actual transport cost . We may want to perform further computations on those white areas in order to approximate the remaining transport cost and calculate an error bound for our approximation.
2.2 The boundary method
We will now formalize the process described in Example 2.1. As described below, the boundary method generates a grid over the unevaluated region of , and uses it to determine the subgrid containing the boundary set . This subgrid is determined by finding an optimal transport solution from the grid to the point set .
Although not strictly necessary, we will restrict ourselves to and apply a Cartesian grid over that region. At the -th refinement level of the algorithm, the grid will thus consist of a collection of boxes with width in each dimension of our discretization. By a slight abuse of notation, we use to refer to such a box, centered at the point . Thus, refers to the -measure of the box of width centered at .
Neighboring boxes are those with center points that differ by no more than one unit in any discretization index. The set of neighbors of is denoted (defined in Equation (3.11), below). Because regions of -measure zero need not be transported to any particular , boxes of positive weight that are adjacent to such regions are always retained. We refer to such a box as an edge box. Thus, the set of edge boxes is
[TABLE]
Because contains the support of , every box of positive mass that is adjacent to the boundary of is an edge box.
A box whose neighbors and itself all have positive measure is referred to as an internal box. The set of internal boxes is
[TABLE]
Boxes of -measure zero are not part of or and they are discarded when the optimal transport problem is solved. We need not be concerned about losing a region due to this discard process, since this would imply (and hence , which contradicts the conditions in Section 1.2).
Region interiors are identified by comparing the destination of each to the destinations of its neighbors. Edge boxes are never considered part of a region interior, so they are passed directly to .
In order to remove identified region interiors, we also maintain a running total of the untransported mass, given by partial measure . To preserve the balance of the transport problem, each time a region is transported from to , the remaining amount that can be transported to , , must be reduced by .
We can approximate the Wasserstein distance by generating a running total over region interiors: . This is an increasing function of , and for all , . The Wasserstein distance over any remaining boundary region is evaluated at completion.
Remark 2
Further approximations may be required for a truly general algorithm. Depending on , it may be necessary to approximate the mass of each box, . Depending on and , the Wasserstein distance over each box, given by , may also require approximation. However, in this work we assume that the integrals can be computed exactly. In practice, this is not a significant limitation. Most numerical applications focus on the exactly-computable cases where is uniform and is the Euclidean or squared-Euclidean distance. Furthermore, as we show in Section 4.1, the set of exactly-computable options is quite large.
2.2.1 Step (1): solving the discrete optimal transport
problem
The proofs in Section 3 assume the discrete solver is exact, but in practice we achieve good results using solvers whose error satisfies reasonable bounds. Thus, the ideal discrete algorithm should be fast, have controlled error, and possess reasonable scaling properties. To satisfy these requirements, and to bypass the shortcomings of standard discrete approaches, we have turned to the distributed relaxation methods known as auction algorithms; see [41] and [42]. (As it turns out, there are natural connections between auction algorithms and the Oliker-Prussner algorithm for semi-discrete transport; see [43] for details).
We chose to apply a new auction algorithm, the general auction, which we developed and presented in [44]. The general auction is so named because it is based directly on the (more general) real-valued transport problem, rather than the integer-valued assignment problem which forms the foundation of other auction algorithms. As described in [44], it offers significant performance advantages over other auction algorithms. Public domain C**++** software implementing the general auction can be found on the Internet at [45].
2.2.2 Step (4): computing the shifts
Once we have reached a desired level of refinement for the boundary, we can use the set to identify shift differences . Finding the shift differences is not necessary once we have the boundary (which is why Step (4) is optional), but the shift differences allow one to reconstruct the entire transport map.
By completing Step (4), one can reduce the transport map in to a set of real numbers , greatly reducing storage requirements. Also, building the reconstructed transport map, and comparing the value of each to its corresponding , effectively evaluates the actual (vs. worst case) error associated with the boundary method’s solution.
It is also worth considering that the exact shifts correspond to a transport map giving the exact optimal solution of our semi-discrete problem. The approximated shifts , unless generating the same shift differences, correspond to a transport map giving the exact optimal solution to a different semi-discrete problem, one whose measure at each , , corresponds to the value of . Hence, is the error in measure when approximating by .
2.2.3 Step (5): approximating the Wasserstein
distance
Because some applications focus on determining the transport map, rather than the Wasserstein distance, Step (5) is optional. One could also skip the computation of in Step (2), since the Wasserstein distance can be computed in full using only the transport map defined by the boundary set. However, we find it convenient to compute as much of the distance as possible within the boundary method algorithm, establishing one box at a time during Step (2). By the time we reach Step (5), the partial Wasserstein distance includes the exact cost of all the identified interior regions, and all that remains is to determine the cost of the regions associated with .
3 Mathematical support
In this section, we provide mathematical support for the boundary method, assuming that all computations are solved exactly: both the discrete optimal transport problems handled by the general auction and the determinations of mass and Wasserstein distance for individual boxes (see Remark 2). We present three types of results: on the shift characterization, on our system of equations, and, finally, on the boundary method itself.
3.1 Semi-discrete optimal
transport and the shift characterization
Here we examine the features of the shift characterization, defined in Section 1.3, and consider what they can tell us about the semi-discrete optimal transport problem itself. While many of these results can be found in other works (e.g., [5]), detailing them fixes notation and sets the stage for the original theorems developed in the following sections.
First, in Lemmas 3.1 and 3.2, we develop theoretical support for the boundary method.
Lemma 3.1
Let and be defined as in Definition 1.4. Fix . If and , , then the following hold:
[TABLE]
where is defined in Equation (2.5) and in Equation (2.1).
Proof 1
Let us show Equation (3.1). By the definitions of and ,
[TABLE]
Rearranging terms gives
[TABLE]
To show Equation (3.2), first note that Section 2.1 already explains how implies . Consider the converse: Assume that . Rewriting, we find that , with defined in Equation (1.14). This implies , and since , therefore . Equation (3.3) is a consequence of Equations (3.1) and (3.2).
Lemma 3.2
Let and be defined as in Definition 1.4 and as in Equation (2.1). Assume satisfies the triangle inequality. For all , ,
- (a)
If , then .
- (b)
If , then .
Proof 2
For Part (a), because satisfies the triangle inequality, for all ,
[TABLE]
Suppose . Then , by Equation (1.14). Because is defined as the maximum such difference, this implies , and so . Further, since is an element of and , . Therefore, .
To show (b), note that (3.4) now gives . Hence, for all , . Therefore, .
Lemma 3.3
Let be defined by Equation (1.14). If the ground cost function is continuous on , then is a continuous function of .
Proof 3
Assume is defined as a continuous function in . Thus, for all , is a continuous function of . Since is the maximum of a finite set of continuous functions, is itself a continuous function of .
Definition 3.4** ( induces a -partition of )**
Let be as defined in Equation (1.14), and the sets as defined in Equation (1.15) for . Then one says induces a -partition of the set if
, 2. 2.
for all , , (for as defined in Equation (2.1)), 3. 3.
, and 4. 4.
for all , .
Lemma 3.5
Suppose one has a semi-discrete transport problem, as described in Section 1.2. Let be as defined in Equation (1.14), the sets as defined in Equation (1.15) for , and as defined in Equation (2.2). Then induces a -partition of if and only if .
Proof 4
If induces a -partition of , by Definition 3.4, . For the converse, assume and the sets are defined as given, and let be defined by Equation (2.1). Because is a probability density function, . Because is a non-negative measure, implies that, for all , , .
For any -measurable set , ,
[TABLE]
and since ,
[TABLE]
Proceeding inductively, it follows that
[TABLE]
Thus,
[TABLE]
For all , , , and therefore .
Remark 3
Instances of appear quite often, though (as we will show) for the -norm cost functions we have assumed. For an example of in the literature, see Figure 37 of [46]. We include a nearly identical example as Figure 4LABEL:sub@f:badManhattan of our paper, along with a discussion of this behavior.
Given our definition of the semi-discrete problem in Section 1.2, Corollary 4 of [6] provides a sufficient condition for the existence of a Monge solution that is unique -a.e. For convenience, we restate their conclusion here, as the following:
Theorem 3.6
Given the definition of in Equation (2.5), suppose that the support of is finite, is continuous, is tight, and
[TABLE]
Then there exists an optimal transport map, , that solves the Monge problem, and is -a.e. unique.
This condition leads directly to the following theorem.
Theorem 3.7
A semi-discrete transport problem, as described in Section 1.2, has an associated transport map , a function , as described in Equation (1.14), and sets , as described in Equation (1.15), such that for all ,
[TABLE]
where is the strict interior of , as defined in Equation (2.3). In other words, induces a -partition of and agrees with on . Furthermore, is unique -a.e.
Proof 5
Let be defined as given in Equation (2.1), as given in Equation (2.2), and as in Equation (2.5). Consider the requirements given in Section 1.2. Condition (2) ensures that is finite, and Condition (3) implies is continuous. We know that , so is a Polish space, and Condition (1)(1)(b) assures us that is compact. Because every probability measure on a compact Polish space is tight777see e.g. Theorem 3.2 of [47, p. 29], must be tight. Because Condition (3) requires that the ground cost is equal to a -norm with ,
[TABLE]
By Condition (1)(1)(a), is absolutely continuous, and so
[TABLE]
as required by Equation (3.5) (see [48] for another argument). Therefore, the conditions of Theorem 3.6 are satisfied.
Let the function and sets be as described in Definition 1.4. For any , , for some fixed . Hence, it follows that , and thus, by Lemma 3.5, induces a -partition of . Therefore, we can construct a transport plan that satisfies the semi-discrete problem and agrees with on . Furthermore, by Theorem 3.6, is unique -a.e.
Remark 4
A close reading of the text of Theorem 3.7 reveals that the guarantee of derives directly from the fact that ; see Equation (3.7). Absolute convergence does the rest. In practice, this means that the boundary method forces a unique transport map on all of , even regions where vanishes and any other map would achieve the same Wasserstein measure. For an example of this, see Figure 6. This behavior stems from a natural (unstated) corollary to Theorem 3.7: the boundaries identified by our method are a.e. unique with respect to the Lebesgue measure. The convexity of is required to guarantee the existence of the requisite boundaries. Otherwise, the network of regions might not form a connected graph.
3.2 Existence of linearly independent boundary
equations
To prove the existence of linearly independent equations of the form shown in Equation (2.6), we will investigate the structure of the boundary set using a connected graph.888For a different approach, where the cost is the squared-Euclidean distance, see [10].
Definition 3.8
Assume the definition of given in Equation (2.1). Let be a graph with vertices . The edge is contained in the edge set of if and only if is non-empty. We refer to as the adjacency graph of our transport problem.
Lemma 3.9
Let be defined as given in Definition 3.8. If the set is convex and compact, then is a connected graph.
Proof 6
Assume to the contrary that is not a connected graph. Then we can write as the union of two disjoint nonempty subgraphs, , such that no vertex in has a path connecting it to any vertex in .
Construct
[TABLE]
where each subset is defined as in Equation (1.15). Since and , and . Because and are disjoint, and no paths connect them, it follows that . Since the union of and is , .
Suppose , . Then . Because is a compact set, is a closed and bounded, and hence the definition given in Equation (1.15) implies that and must each also be closed and bounded. Thus, and are disjoint compact sets in the Hausdorff space . This implies and are separated by some positive distance . Because this is true for all and , there exists , the minimum over all such .
Let , , and for all , define
[TABLE]
Because , there exists , , such that implies . This contradicts the convexity of . Hence, is connected.
Corollary 3.10
Assume and let be defined by Equation (2.1). If , there exists , such that and .
Proof 7
Assume the contrary for some , and apply Definition 3.8. Since , includes at least two vertices, and is disconnected from the rest of , which contradicts Lemma 3.9.
Corollary 3.11
Let be defined by Equation (2.1) and by Equation (2.2). If , then the boundary set is nonempty, and for each , there exist such that and .
Proof 8
This follows from Corollary 3.10 and the definition of in Equation (2.2).
Lemma 3.12
Assume a shift characterization, as described in Definition 1.4, where and the shifts are unknown. Let be the adjacency graph of the transport problem given in Definition 3.8, and let be a subgraph of that includes all vertices. Define the system of equations
[TABLE]
where each is given by some constant. The system of equations is linearly independent with respect to the shifts if and only if contains no cycles.
Proof 9
* Suppose contains the cycle . Then contains the linear system*
[TABLE]
Because , we know is linearly dependent.
* Suppose instead that is linearly dependent. Given the form of the equations in , we can assume without loss of generality that contains the equations , , and that is also in . By the definition of , these equations imply that the edges , and are contained in . Together, these edges generate the cycle , so contains at least one cycle.*
Theorem 3.13
Assume a shift characterized problem, as described in Definition 1.4, where and the shifts are unknown. Then there exists at least one system of exactly equations of the form that is linearly independent with respect to the set of shifts , with each constant. No system of independent equations exists.
Proof 10
Let be as given in Definition 3.8. Because is a connected graph, we can always create a spanning tree that is a subgraph of . Let be the corresponding set of linear equations, defined as described in (3.8). As a spanning tree, contains edges and has no cycles, so by Lemma 3.12, we know contains exactly linearly independent equations.
Suppose a set of linearly independent equations exists, all of the form . Because there are unknowns in the set of shifts, there is exactly one solution set . Fix and for all , define . For each equation in , . Thus, is also a solution to . This contradicts the uniqueness of , and therefore no such set of linearly independent equations exists.
3.3 Discretization for the boundary
method
In the first two subsections below, we give some results on how the grid-points interact with the underlying space. In sections 3.3.3 and 3.3.4 we present error bounds. In section 3.3.5 we consider issues of volume and containment: here we ensure that one can have for all , and show that as . Finally, Section 3.3.6 puts bounds on the error for the Wasserstein distance approximation.
3.3.1 Discretization definitions
As described in Section 2.2, we discretize the region using a regular Cartesian grid, and refine the grid over multiple iterations, with the aim of refining only the grid region containing the boundary set.
Definition 3.14
Let be the set of adjacency vectors for all discretizations of . We choose to be the linear combinations of the standard unit vectors, , with coefficients . We specifically exclude the zero vector from the set, so . If , equals
[TABLE]
Let be the current discretization level, and be the width of the discretization at level . Let be the -th point set, the set of points included in the -th discretization of . Since we discard boxes of -measure zero during the transport step, assume without loss of generality that for all .
For each iteration , let
[TABLE]
for all . For all , the points in that are adjacent to constitute a subset of the neighbors of ,
[TABLE]
Lemma 3.15
Let be the set of points included in the -th discretization of , and assume the definition of given in Equation (3.11). For all , if , then .
Proof 11
This follows from Equation (3.11) and the adjacency vectors established in Definition 3.14: for all , .
We now formalize our idea of the -th interior and boundary point sets used in our discretization. For all , define the -th iteration interior point set associated with as
[TABLE]
Define the -th boundary point set as
[TABLE]
and let
[TABLE]
for all . The -th evaluation region, the subset of enclosed by the discretization , is defined as
[TABLE]
and the -th boundary region, the subset of enclosed by the boundary point set , is given by
[TABLE]
3.3.2 Distance bounds
Though the discretization is fully defined, it still needs to be related back to the sets and the boundary set . To do this, we first bound the distance separating and .
Lemma 3.16
Let be the set of points included in the -th discretization of , the width at that discretization, and the adjacency vector set satisfying Definition 3.14. Assume is defined by Equation (2.1), by Equation (2.8), and by Equation (3.14). Suppose is convex, is a -norm on , and . For each , either or there exists a point , with , such that for some . Thus, if , the distance from to the set , as measured with respect to the ground cost , is bounded above by .
Proof 12
Recall the definition of in Equation (3.10). Assume . By the definition of given in Equation (3.13), there exists for some , where is the set of neighbors of as defined in Equation (3.11). By Lemma 3.15, , and since , we have . Thus, and , and because is convex, this implies
[TABLE]
Because is continuous on , Lemma 3.3 applies. Hence, is continuous on . Therefore, because and , there exists such that . Then , so by applying the ground cost we have
[TABLE]
Therefore, .
Because we can bound the ground cost between the points in and the set in terms of the ground cost between neighboring points, it is worth identifying a bound on that ground cost between neighbors.
Lemma 3.17
Suppose , and assume a shift characterized problem in . Let be defined as given by Equation (3.11) and as given by Equation (3.14). For the -th iteration of the boundary method, given width , there exists a maximum such that, for all and , where and , if , then .
Proof 13
Let and be defined as above. By applying the definition given in Equation (3.13), for some . For our Cartesian grid , achieves its maximum when , so
[TABLE]
Therefore, there exists maximum such that, for all and , .
3.3.3 Error bounds for shift
differences
In order to bound the error on the Wasserstein distance, we merely require a finite bound on the errors for the individual shift differences, . However, accurately computing the shift differences themselves is also important, and for that reason, we also present theorems that more finely bound the error on for important ground cost functions. Because estimates are generated using one or more computations of , the magnitude of these errors is dependent on the point(s) chosen.
Lemma 3.18
Let be defined by Equation (2.1), and by Equation (2.6). Suppose the ground cost satisfies the triangle inequality. Let and such that . The error resulting from approximating at is bounded above by , where
[TABLE]
and is the point in nearest to with respect to the ground cost.
Proof 14
Assume is the closest point in to . Then
[TABLE]
For every , there exists some such that
[TABLE]
By rearrangement and substitution, we have
[TABLE]
Since satisfies the triangle inequality,
[TABLE]
Thus,
[TABLE]
and, by a similar line of reasoning, . Therefore,
[TABLE]
In addition to bounding the error for individual points , we can also establish meaningful global bounds.
Lemma 3.19
Assume a shift characterized problem in , where is a -norm, . Let be the width of the discretization during iteration , and let indicate the box of width centered at the point . Let be defined by Equation (2.2), by Equation (3.11), and by Equation (3.14). Taking as defined by Equation (3.17) and as given by Equation (3.16), let be the maximum value of over all and , such that: (1) , (2) for some , and (3) . Then and for all , .
Proof 15
Suppose . By the definition of our grid, is contained in some , where is a finite set of neighboring grid points. For each , , and hence for some , the adjacency vectors described in Definition 3.14. Since , . Because and were arbitrarily chosen, this is true of every pair of vertices of . By the definition of , can be written as a convex combination of the points in . Therefore, for any fixed , .
Recall the definition of given in Equation (3.13). Because , must be nonempty. Assume without loss of generality that for some . Because satisfies the triangle inequality, Lemma 3.18 applies. Hence, there must exist a point , a neighbor of , with , and a point such that . Applying the triangle inequality, we find that
[TABLE]
Therefore, and .
3.3.4 Error bound for ground costs
In preparation for bounding the Wasserstein distance error, we now bound the error on the ground cost with respect to individual points in .
Lemma 3.20
Given a shift characterized transport problem in , with ground cost , . Assume is the width of the discretization at the -th iteration and let be an approximated transport plan with associated transport map , obtained using the boundary method with discretization . Suppose is an optimal transport plan with associated map , and let in such that , but . Then the error in the ground cost at the point is equal to , where is defined as given in Equation (2.5). Furthermore, there exists such that, for all such with and for some ,
[TABLE]
Proof 16
Let such that , but . Then the error in the ground cost at equals
[TABLE]
As a consequence of Lemma 3.19:
[TABLE]
The result is independent of , , and , and therefore there must exist some .
3.3.5 Volume and containment for the boundary
region
As shown in Section 3.3.4, the ground cost error for individual points is finitely bounded over a wide range of admissible ground cost functions. By definition, the measure is bounded. We propose to identify the largest possible region in which the ground cost error can be non-zero, and to show that the area of that region goes to zero as goes to infinity. With this, we will show that the boundary method converges with respect to the Wasserstein distance.
In Equation (3.16), we defined a region based on the point set . For this, we need to know that we can choose an initial width such that, for all iterations , . Theorem 3.21 guarantees that such a width exists, and gives a sense of the relevant features driving the choice of . For details about the numerical considerations involved, see Remark 5.
Theorem 3.21
Assume is a -norm, , and . There exists an initial width such that, for all such that , , as defined by Equation (3.12), implies the box of width centered at , given by , satisfies , where is the strict interior of , as defined by Equation (2.3).
Proof 17
Recall the definition of given by Equation (1.15), given by Equation (2.1), given by Equation (2.2), and given by Equation (2.5). Let indicate the open ball of radius (with respect to the -norm ) centered at and indicate the -dimensional cube with side length (with respect to the Euclidean distance) centered at . Because is a -norm, for each , , and therefore there exists such that . Thus, there exist and , such that, for any , implies for all .
Let
[TABLE]
Because is a closed set minus a finite number of open sets, is closed.
If all are hyperplanes on , the claim is self-evident for all , so assume instead that at least one is not a hyperplane on . Let
[TABLE]
There exists a maximum directional magnitude with respect to the Euclidean distance,
[TABLE]
where is the magnitude of the vector projected parallel to the direction of . Because , and are well-defined. For any and any unit direction vector ,
[TABLE]
and
[TABLE]
Hence, and .
Assume the Gaussian curvature of the set at a point is given by the function , and when the radius of curvature is given by . Because is defined as a product of first and second directional derivatives of , and those derivatives are bounded, there exists a maximum absolute Gaussian curvature for on , given by
[TABLE]
Because at least one is not a hyperplane, . Because , for any , and any , the radius of curvature is bounded below: .
Let . Suppose , for some , and that for some , .
The set is the cube surrounding and its neighbors. Because , cannot be a hyperplane in , and so is well-defined on . If there exist and such that , then , and since , this implies and . This implies , which contradicts the claim that . Therefore, for all and . This implies . Hence, the intersection of the boundary with the cube must have a point with minimum radius of curvature,
[TABLE]
and since , it must be the case that .
Because , but , there must exist . Hence, within the cube , there must be a -dimensional sphere (or partial sphere) of radius , not in , whose boundary intersects (“partial” because the sphere may be cut off by one or more of the planes bounding the cube). Call this (partial) sphere .
Since , it must be the case that , where is the set of neighbors defined by Equation (3.11). Because , and the maximum distance between grid points in is , this requires . Hence, there exists such that
[TABLE]
This contradicts . Thus, it must be the case that for all , implies , and therefore .
Setting completes the proof.
Remark 5
By the boundary method’s very nature, any initial configuration must enclose the boundary in a way that allows it to be distinguished from the region interiors. This is the meaning behind the width considered in Theorem 3.21. In principle, may need to be quite small. In practice, the potential problems associated with an overly-large rarely occur, and they are obvious when they do. We did occasionally observe an issue when the initial was so large that a region could contain an entire (in other words, when was significantly larger than the described in Theorem 3.21). In those cases, the affected region’s was such that for some , and the resulting transport plan had . Hence, the set and reconstructed regions directly revealed when such an error had occurred.
Also, because of the nature of the iterative method, a poor choice of quickly becomes obvious in the boundary region itself. Simply put, the loss of any portion of the boundary set destabilizes the method. Losing part of creates a visible gap in the “wall” between two regions, and the gap increases in size with each successive iteration. This behavior seems to occur whenever some part of is lost, no matter what the cause. For example, in our tests we observed that discarding an edge box that intersects results in the same progressive damage to the boundary set. Not surprisingly, this also “stalls” the convergence of the Wasserstein distance in ways that are obvious during computation.
In our numerical tests, we used and obtained consistently reliable results.
Next, we show that a well-chosen initial width and grid arrangement can guarantee that, for every iteration , each point in corresponds to a box in the interior of some region .
Theorem 3.22
Assume is a -norm, , and . Suppose the first iteration width is chosen as described in Theorem 3.21. Fix , let , and let be the boundary set remaining at the -th iteration. Given the definition of from Equation (2.2), from Equation (3.15), from Equation (3.16), if , then , and hence .
Proof 18
We will show the conclusions by proving that implies .
Suppose . If , then , since by assumption, . Thus, we assume instead that .
Because , we know , the box of radius centered around some . We have for some , where is defined as given in Equation (1.15), and so by the definition of from Equation (3.10), . However, implies , so from the definition of given in Equation (3.13), . Because, (see Equation (3.12)), by Theorem 3.21, . Hence, . Therefore, by Equation (2.3), .
Now that we have ensured , we aim to construct a region of controlled volume enclosing : . Then we show that, as , the volume of in goes to zero with respect to the Lebesgue measure. This will allow us to put a convenient upper bound on the volume of in terms of the width . Because exists solely in , and not on the product space, we can once again rely on the Euclidean distance in .
Lemma 3.23
Assume a shift characterized transport problem in , with , . Suppose is the width used for the -th iteration, and assume is defined as given in Equation (2.2), as given in Equation (3.16). Let the region be defined as
[TABLE]
For all , .
Proof 19
By definition, . Suppose . Because we are applying the Euclidean norm, Lemma 3.19 implies that , and since , .
Theorem 3.24
Assume is a -norm, . Let be the width of the discretization applied during the -th iteration. Given the definition of in Equation (2.2) and in Equation (3.16), if and there exists some constant such that with respect to the Lebesgue measure, then there exists some , such that with respect to the Lebesgue measure.
Proof 20
Recall the definition of given in Equation (3.19). We know . Let be the closed ball of radius centered at , and defined with respect to the Euclidean distance. Write
[TABLE]
For all fixed ,
[TABLE]
where is the volume of the -dimensional sphere of radius , defined with respect to the Euclidean distance. By using the function, this volume can be written as
[TABLE]
Because the volume is independent of the point , we therefore have
[TABLE]
where
[TABLE]
Let . By applying Lemma 3.19 with the Euclidean distance, we know that for all , , which implies . Thus, , which implies .
Remark 6
The interplay between , , , and is nontrivial. Figure 3 helps to visualize it properly. In Figure 3LABEL:sub@f:Br, we show placement of some boundary set . It is crucial that the subgrid created by completely surrounds , because that is the only way to ensure that . One can see in this image how a (very degenerate) choice of , coupled with the right arrangement of ’s, might allow a small and sharply curved boundary set to slip unnoticed between points.
As Figure 3LABEL:sub@f:barBr illustrates, each point in appears as the center of its corresponding box, and the boxes completely cover the boundary set.
The region is deliberately constructed to entirely cover all the boxes in . As Figure 3LABEL:sub@f:barBrPLUS shows, its volume can be significantly larger than that of the boxes it contains. However, the worst-case “thickness” given to ensures that it will always enclose both and .
3.3.6 The Wasserstein distance error
Theorem 3.25
Assume is absolutely continuous and let be the Wasserstein distance. Let be the width of the -th iteration of the boundary method. Given the definition of in Equation (2.2) and in Equation (3.16), suppose , and that there exists some such that with respect to the -dimensional Lebesgue measure. If is the maximum error of the ground cost in the set , and is the Wasserstein distance approximation obtained with the boundary method, then the value of on is bounded by some and
[TABLE]
where the bound equals the maximum possible volume of multiplied by the maximum value of and the maximum error of the ground cost.
Proof 21
If , then has been identified as being in the interior of for some . Thus, the cost error associated with the points outside is zero.
Suppose instead that . By definition, the absolute value of the difference between the correct and approximated ground costs at is less than or equal to . Condition (1)(1)(a) requires to be absolutely continuous, so there exists such that, for all , .
Therefore, the error on the Wasserstein distance is bounded above by
[TABLE]
Remark 7
The bounds in Theorems 3.24 and 3.25 indicate that the volume of the boundary set and the error of the computed Wasserstein distance decrease according to the dimension of the space. Thus, we should expect our numerical tests to show a quadratic (in ) or cubic (in ) decrease of the Wasserstein distance error. These decreases are clearly observed in practice, see Section 4.
4 Numerical results
4.1 Test conditions
As mentioned in Remark 2, some choices of may make it necessary to approximate or . However, the majority of numerical studies we have seen restrict to simple choices of (most often uniform). For this reason, we restricted our examples to cases where the cost and mass integrals can be written in a closed form.
4.1.1 The closed-form mass
The integral of over some box can be written as:
[TABLE]
Since is a probability density function, we must have . For convenience, let denote an un-normalized version of , and similarly for .
Using the linearity of the integral, one can use linear combination of simple functions for which exact solutions are known. We can also construct more complex measures by partitioning into disjoint subsets. In this case, however, we add an additional restriction in order to be sure that exact solutions can always be found: We -partition into subsets , such that the boundaries of each fall on the initial set of grid lines. Assume that for each set , there exists a density function that is exactly solvable on . From these, we consider (and ) to be the piecewise functions defined on each as (and , respectively).
Most of our computations were performed in two-dimensions. For such problems, given iteration and , can be written as
[TABLE]
The closed-form choices used in our numerical tests are shown in Table 1. As described above, we used the table entries as building blocks in the construction of more complex measures.
4.1.2 The closed-form Wasserstein distance over
We performed many tests where could be computed exactly but the Wasserstein distance could not; see Section 4 for details. In such cases, we made no attempt to approximate , choosing instead to focus on the accuracy of the -partition generated by the approximate shift set .
However, there were a number of cases in two dimensions where the choice of and allowed for closed-form computations. In those cases, because the combination of and gives us an exact solution, there exists such that
[TABLE]
As in Section 4.1.1, we write when working with .
Now consider , , and . When , the Wasserstein distance on is also zero. For those boxes where , we can take advantage of the uniformity to define the function in terms of a single variable: the component-wise distance between points given by , where , . When the Wasserstein distance over can be computed and is non-zero, it takes the form
[TABLE]
where is an explicit function.
Table 4.1.2 gives Wasserstein distance functions for the -norm and the -th power of some -norm (). By leveraging the linearity of the integral and subdividing into disjoint sets, we can build combinations of ground costs and measures with closed form . We used this to perform tests in , with being either uniform or zero in relevant boxes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. Monge, Mémoire sur la théorie des déblais et des remblais, in: Histoire de l’Académie Royale des Sciences de Paris, avec les Mémoires de Mathématique et de Physique pour la même année, Académie des sciences (France)., 1781, pp. 666–704, in French.
- 2[2] L. V. Kantorovich, On the translocation of masses, C.R. (Doklady) Acad. Sci. URSS (N.S.) 37 (1942) 199–201.
- 3[3] L. V. Kantorovich, On a problem of Monge, Uspekhi Mat. Nauk 3 (1948) 225–226.
- 4[4] C. Villani, Topics in Optimal Transportation, Vol. 58 of Graduate Studies in Mathematics, American Mathematical Society, Providence, R.I., 2003.
- 5[5] W. Gangbo, R. J. Mc Cann, The geometry of optimal transportation, Acta Mathematica 177 (2) (1996) 113–161.
- 6[6] J. A. Cuesta-Albertos, A. Tuero-Díaz, A characterization for the solution of the Monge-Kantorovich mass transference problem, Statistics and Probability Letters 16 (2) (1993) 147–152.
- 7[7] A. Pratelli, On the equality between Monge’s infimum and Kantorovich’s minimum in optimal mass transportation, Annales de l’Institut Henri Poincare (B): Probability and Statistics 43 (1) (2007) 1–13.
- 8[8] L. Rüschendorf, Monge-Kantorovich transportation problem and optimal couplings, Jahresbericht der Deutschen Mathematiker-Vereinigung 109 (3) (2007) 113–137.
