Entropic regularization of continuous optimal transport problems
Christian Clason, Dirk A. Lorenz, Hinrich Mahler, Benedikt, Wirth

TL;DR
This paper provides a comprehensive analysis of entropic regularization in continuous optimal transport, establishing duality, existence of solutions, and convergence results, especially addressing cases with marginals lacking finite entropy.
Contribution
It introduces a novel analysis framework using Orlicz spaces for entropic regularization, proving existence and duality results, and demonstrating Gamma-convergence for non-finite entropy marginals.
Findings
Strong duality for the regularized problem in continuous functions.
Existence of minimizers in Orlicz space when marginals have finite entropy.
Gamma-convergence of regularized solutions to the original problem.
Abstract
We analyze continuous optimal transport problems in the so-called Kantorovich form, where we seek a transport plan between two marginals that are probability measures on compact subsets of Euclidean space. We consider the case of regularization with the negative entropy with respect to the Lebesgue measure, which has attracted attention because it can be solved by the very simple Sinkhorn algorithm. We first analyze the regularized problem in the context of classical Fenchel duality and derive a strong duality result for a predual problem in the space of continuous functions. However, this problem may not admit a minimizer, which prevents obtaining primal-dual optimality conditions. We then show that the primal problem is naturally analyzed in the Orlicz space of functions with finite entropy in the sense that the entropically regularized problem admits a minimizer if and only if the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\manuscriptlicense\manuscriptcopyright\manuscripteprinttype
arXiv \manuscripteprint1906.01333v3
Entropic regularization of continuous optimal transport problems
Christian Clason Faculty of Mathematics, University of Duisburg-Essen, 45117 Essen, Germany (, \orcid0000-0002-9948-8426)
Dirk A. Lorenz Institute of Analysis and Algebra, TU Braunschweig, 38092 Braunschweig, Germany (, \orcid0000-0002-7419-769X) [email protected]
Hinrich Mahler Institute of Analysis and Algebra, TU Braunschweig, 38092 Braunschweig, Germany (, \orcid0000-0001-9108-549X) [email protected]
Benedikt Wirth Applied Mathematics Münster, University of Münster, Einsteinstraße 62, 48149 Münster, Germany (, \orcid0000-0003-0393-1938)
(2020-06-15)
Abstract
We analyze continuous optimal transport problems in the so-called Kantorovich form, where we seek a transport plan between two marginals that are probability measures on compact subsets of Euclidean space. We consider the case of regularization with the negative entropy with respect to the Lebesgue measure, which has attracted attention because it can be solved by the very simple Sinkhorn algorithm. We first analyze the regularized problem in the context of classical Fenchel duality and derive a strong duality result for a predual problem in the space of continuous functions. However, this problem may not admit a minimizer, which prevents obtaining primal-dual optimality conditions. We then show that the primal problem is naturally analyzed in the Orlicz space of functions with finite entropy in the sense that the entropically regularized problem admits a minimizer if and only if the marginals have finite entropy. We then derive a dual problem in the corresponding dual space, for which existence can be shown by purely variational arguments and primal-dual optimality conditions can be derived. For marginals that do not have finite entropy, we finally show Gamma-convergence of the regularized problem with smoothed marginals to the original Kantorovich problem.
1 Introduction
The Kantorovich formulation of optimal transport is the problem of finding a transport plan that describes how to move some measure onto another measure of the same mass such that a certain cost functional is minimal [22]. Specifically, let and be two compact subset of and , respectively. For given probability measures on and on and a continuous cost function , the goal is to find a measure on such that the cost is minimal among all that have and as marginals. This problem has been well studied, and we refer to the recent books [34, 32] for an overview. For example, it is known that the problem has a solution and that the support of is contained in the so-called -superdifferential of a -concave function on , see [2, Thm. 1.13]. (This is sometimes called the fundamental theorem of optimal transport.) In the case where and are both subsets of and where is the squared Euclidean distance, this implies that optimal plans are singular with respect to the Lebesgue measure. Hence, the optimal plan is not a measurable function, and so standard approximation techniques from numerical analysis (e.g. by piecewise constant or piecewise linear functions) are not applicable. This motivates the use of regularization of the continuous problem to obtain approximate solutions that are functions instead of measures, which in turn can be treated by classical discretization techniques in order to solve the regularized problem.
In this work we focus on entropic regularization by adding a multiple of the negative entropy of (with respect to the Lebesgue measure) to the objective function. This forces the optimal plan to be a measure that has a density with respect to the Lebesgue measure. Furthermore, in the discrete setting, this allows to solve the problem numerically by the very simple Sinkhorn algorithm [23, 17, 4].
Notation and problem statement.
To fully state the regularized optimal transport problem, we introduce some notation. By and we denote the set of Radon and probability measures on , respectively. The Lebesgue measure will be denoted by (the set on which it is defined being clear from the context), and integrals with respect to the Lebesgue measure are simply denoted by with the appropriate integration variable . We write for the space of -integrable functions with respect to the measure but omit the set if it is clear from the context. If no measure is given, always refers to the space with respect to the Lebesgue measure. In the case where the measure has a density with respect to the Lebesgue measure, we will also use for that density. For and , we denote by the pushforward of by , i.e., the measure on defined by for all measurable sets . In particular, we will use the coordinate projections , , and the fact that is the th marginal of . The entropically regularized Kantorovich problem of optimal mass transport between and is then given by
[TABLE]
(Note that we used the negative entropy of with respect to the Lebesgue measure for regularization. One could also consider regularization by adding for some other measure , e.g., the product measure [18], but we will not pursue this further.) A purely formal application of convex duality then yields the predual problem
[TABLE]
Having a primal and a dual problem, it is now possible to write down the system of Fenchel–Rockafellar extremality conditions and derive and analyze algorithms to solve this system; in fact, this is one of the possible ways of deriving the Sinkhorn algorithm in the discrete case. However, the existence of solutions to (D) – which is necessary to rigorously obtain extremality conditions – is not obvious in the continuous case. As it turns out, neither (P) nor (D) may admit a solution in the considered spaces. As we will show, it is necessary and sufficient to obtain existence of a primal solution for the marginals to be in the Banach space of functions of finite entropy; correspondingly, a reformulation of the predual problem in the dual space allows showing existence of a maximizer by purely variational methods. For marginals that are not in , we show -convergence of minimizers of regularized problems with suitably smoothed marginals.
Related work.
The continuous optimal transport problem has been analyzed in the survey paper [24] where the relation to the so-called dynamic Schrödinger problem is made. Another survey [20] presents an existence proof for a reparameterized optimality system based on the convergence analysis for a continuous variant of Sinkhorn’s algorithm (and attributes the proof and the algorithm to Fortet [21]). A detailed overview of the connections between optimal transport, the Schrödinger problem, and the Sinkhorn algorithm from a stochastic control viewpoint is given in the even more recent survey [14]. In [11], primal existence has been shown in the subset of the space of measures which have a density of finite entropy with respect to the Lebesgue measure. Furthermore, [15] analyzes the problem (for unbalanced transport, i.e., for marginals with different mass) in and derives a dual formulation in . However, the question of existence of a solution of the respective dual problem is not answered. In [9], this gap was closed through a contraction argument using the Hilbert metric. More precisely, [9, Thm. 3.1] guarantees the existence of dual solutions in provided that the feasible set of the dual problem is not empty. Moreover, if a certain constraint qualification holds, the dual optimizers and can be shown to satisfy , where denotes the optimal primal solution. Here , , and correspond to , , of (P) and (D) via , , and . A similar result is also stated in the more recent work [13, Thm. 6], which shows the existence of dual optimizers if the marginals are absolutely continuous probability measures. (The relation of and in [13] to our notation is and .) Another approach to prove the existence of unique solutions (even in the multi-marginal case) is presented in [12, Thm 4.3]. The authors show that a certain map is a bijection, which yields existence of dual solutions and in if the marginals are functions in as well. Moreover, in [6] a compactness argument is used to show the existence of a fixed point of the Sinkhorn iteration; in contrast to our work, the entropy penalization there is considered with respect to the product measure of the marginals.
Previous works [6, 9, 12, 13] tackle the problem of existence of dual solutions under various conditions in standard Lebesgue spaces. For marginals of finite entropy, [16, Cor. 3.2] already states that dual solutions exist and satisfy and (in our notation; the notation there uses , , , , , and ). Note that while the primal solution is in , the analysis takes place in . Moreover, as the authors of [9] note, [16] fails to elaborate a crucial step of the argumentation. This gap was closed only later in [8]. None of the mentioned works considered necessary conditions for existence. Finally, [25] analyzes regularization with the norm of and derives existence of solutions of the dual problem.
The notion of Orlicz spaces in the context of convex integral functionals has previously been used in [26], where existence of both primal and dual optimizers are covered in a more general setting. More precisely, the spaces used in [26], which are also known as Musielak–Orlicz spaces [28], are a generalization of the Orlicz spaces used here. The setting considered here can be recovered in two different ways: In section 7.3 (a), the above referenced results of [16] are recovered as a special case (where again our case corresponds to choosing ). Moreover, choosing in the second example (titled a variant of the Boltzmann entropy) in section 7.1 gives a problem very similar to the one considered here. The difference lies in the fact that the cost function is part of the definition of the relevant Musielak–Orlicz spaces in this case, and hence the analysis takes place in different spaces. As the aim of [26] is to weaken the necessary assumptions as much as possible, the overall setting is more abstract, and the proofs rely heavily on the authors previous work [27]. Here we aim for a self-contained, more elementary, treatment of (P).
Regarding -convergence, the limit for and fixed marginals with densities with finite entropy was considered recently in [11].
Organization.
The next Section 2 recalls statements about functions of finite entropy and the duality of the respective Orlicz space . In Section 3, we collect and prove (for the sake of completeness) results on the regularized optimal transport problem (P) in the context of duality of continuous functions and measures. In particular, Theorem 3.4 shows that primal solutions exist if and only if the marginals are in the space . Hence, we analyze the problem in Section 4 in the context of and . We show existence and uniqueness of the primal problem in , derive the dual problem and show existence of solutions for the dual problem in . We finally show a result on -convergence for the combined regularization and smoothing of marginals that do not have finite entropy in Section 5.
2 Review of functions of finite entropy and the space
Entropic regularization deals with positive integrable functions of finite entropy. These functions are closely connected to the space , a special case of (Birnbaum–)Orlicz spaces, and hence we collect some facts about this space which are mainly taken from [30, 5, 1]; see also [33]. We consider a compact domain and denote the neg-entropy of a measurable function by
[TABLE]
where we set as usual. Note that since for every , the neg-entropy always lies in the interval . We say that has finite entropy if . Following [5], we define
[TABLE]
where .
Proposition 2.1** ([29, Thm. 1.2]).**
A nonnegative measurable function on a set with finite measure has finite entropy if and only if .
It turns out that can be normed such that it becomes a Banach space and that its dual has a natural characterization. In the following, we recall the central constructions and main results based on so-called Young functions.
Definition 2.2** (Young functions).**
Let be increasing and lower semi-continuous with . Suppose that is neither identically zero nor identically infinite on . Then the function , defined by
[TABLE]
is said to be a Young function. Moreover, the function defined by
[TABLE]
is called the complementary Young function of .
Any Young function is continuous and convex on its domain, and the complementary Young function is again a Young function. The notion of Young functions gives rise to a generalization of spaces through the definition of the so-called Luxemburg norm.
Definition 2.3** (Luxemburg norm and Orlicz spaces).**
Let be a Young function. The Luxemburg norm of a measurable function is defined as
[TABLE]
The space of all measurable functions with finite Luxemburg norm is called Orlicz space and denoted by .
Remark 2.4**.**
General Orlicz norms do not scale in a simple way with the size of the set . Writing for the characteristic function of the set , i.e., if and [math] else, the -norm (corresponding to the Young function ) of equals . For a strictly increasing Young function , we obtain the more complicated result . As a consequence, some results in the following depend on the size of the domain. One could get rid of this dependence by adapting the definition of the norm to, e.g.,
[TABLE]
However, since this definition would be nonstandard, we refrain from doing so.
Moreover, note that
[TABLE]
is always true, but equality may fail to hold. For a counterexample, see, e.g., [33, Example 2.8].
Theorem 2.5** ([1, Thm. 8.10]).**
* is a Banach space with respect to the Luxemburg norm.*
We will also need the following estimate.
Lemma 2.6**.**
Let denote the Orlicz space with convex Young function and with . Then .
Proof 2.7**.**
For any , it holds that \int_{\Omega}\Phi\big{(}\frac{|u|}{\gamma}\big{)}\,{\mathrm{d}}x>1. It then follows from the convexity of and that
[TABLE]
Letting , the claim follows.
Note that by Remark 2.4, Lemma 2.6 does not hold for .
Using as Young function now immediately yields . The complementary Young function
[TABLE]
of now provides a natural way to define the Orlicz space . In fact, is the dual space of .
Proposition 2.8** ([5, Thm. IV.6.5]).**
If has finite Lebesgue measure, then (up to equivalence of norms). Moreover, for all , the following embeddings hold
[TABLE]
The Luxemburg norms (1) on are equivalent to the norms defined in [5, Def. IV.6.3] (in [5, Def. IV.6.3], the norms for and are dual to each other). The constants in this norm equivalence will in the following generically be denoted by . Note that [5, Thm. IV.6.5] is stated for domains with unit Lebesgue measure, but the case of general finite measure follows by a simple rescaling.
We also have the following properties, which follow from Theorem 8.21 b and Theorem 8.19 in [1], respectively, by observing that is so-called -regular (c.f. [1, Def. 8.7]) but is not.
Lemma 2.9**.**
- (i)
The space is separable. 2. (ii)
The spaces and are not reflexive.
The following example shows that the desired optimality conditions cannot be derived by simply setting the Gâteaux derivative to zero.
Example 2.10**.**
* is not Gâteaux-differentiable on . Indeed, consider . Then it holds that (since is bounded) and hence that , but note that the formal Gâteaux derivative is not in . To see this, note that is not in and thus by Proposition 2.8 is not in .*
We next derive a few facts that will be useful for the analysis of the primal and dual regularized optimal transport problems. For the first lemma, we use the elementary fact that for all we have .
Lemma 2.11**.**
If , , and (i.e., ), then .
Proof 2.12**.**
We simply estimate
[TABLE]
and use that all terms on the right-hand side are finite since .
Next, we consider a function and its pushforwards under the coordinate projections
[TABLE]
The following result states that these marginals are also in .
Lemma 2.13**.**
If , then for with
[TABLE]
Proof 2.14**.**
Using the convexity of and Jensen’s inequality, we obtain
[TABLE]
where we used for and otherwise. Thus we obtain
[TABLE]
The claim for follows similarly.
As a corollary, we obtain a characterization of on tensor product spaces.
Corollary 2.15**.**
It holds that and if and only if , where
[TABLE]
Proof 2.16**.**
The mapping is the adjoint of , and hence one implication follows from the fact that .
For the other implication, we use the Luxemburg norm and Jensen’s inequality with to observe that
[TABLE]
This shows that plus a constant is in and hence that itself is in . Arguing similarly for , we obtain the claim.
3 Fenchel duality in and
In this section, we study the primal and dual problems for entropically regularized mass transport, i.e.,
[TABLE]
and
[TABLE]
using Fenchel duality in the canonical spaces and . Most of the results in this section are classical [16, 9], but we include the results with proofs for the sake of completeness.
We use the general framework as outlined in, e.g., [19, Sec. III.4] or [3, Chap. 9]. All throughout the following, we assume that , , , , and that and are compact.
We begin with a strong duality result for (P) and (D). A similar result in instead of is [15, Thm. 3.2], but we state the theorem and its proof because we use a slightly different setting.
Proposition 3.1** (strong duality).**
The predual problem to (P) is (D), and strong duality holds. Furthermore, if the supremum in (D) is finite, (P) admits a minimizer.
Proof 3.2**.**
First, by the Riesz–Markov representation theorem, is the dual space of for compact . Furthermore, Slater’s condition is fulfilled with so that strong duality holds and – assuming a finite supremum – the primal problem (P) possesses a minimizer. In addition, the integrand of the last integral in (D) is normal so that it can be conjugated pointwise [31]. Carrying out the conjugation, we obtain
[TABLE]
which is (P).
Remark 3.3**.**
Note that Proposition 3.1 does not claim that the supremum is attained, i.e., that the predual problem (D) admits a solution. The proposition should also be compared to [15, Thm. 3.2], which similarly characterizes solutions under the condition that the dual problem attains a maximizer.
In addition, solutions to (D) cannot be unique since we can add and subtract constants to and , respectively, without changing the functional value. On the other hand, up to such a constant, the functional in (D) is strictly concave, and therefore any solution is uniquely determined by this constant.
We can use this duality argument in combination with the results of Section 2 to address the question of existence of a solution to (P). (Naturally, existence under the stated condition can also be shown using Tonelli’s direct method; here we give a proof based on the already shown convex duality for the sake of conciseness.)
Theorem 3.4**.**
Problem (P) admits a minimizer if and only if and . In this case, the minimizer is unique and lies in .
Proof 3.5**.**
By Proposition 2.1, the energy is bounded if and only if . However, by Lemma 2.13, this is the case only if and similarly for . This shows that the conditions are necessary to have a finite energy. For sufficiency, we first note that for and , the tensor product is a feasible candidate with finite energy by Lemma 2.11. Thus, the infimum in (P) is finite, and weak duality – which always holds due to the properties of supremum and infimum – shows that the supremum in (D) is finite as well. Existence of a solution for (P) now follows from Proposition 3.1.
Uniqueness and regularity of the minimizer then are a direct consequence of the strict convexity of the entropy and Proposition 2.1.
In case a minimizer exists, we can characterize its support. Here and throughout the rest of the paper, we use the usual shorthand for the set . We also recall from Remark 2.4 that refers to the characteristic function of the set . The following result can also be found in [9, Thm. 2.7], but the proof there needs a constraint qualification for the primal problem which we do not need in this formulation. We present a full proof for the sake of completeness.
Proposition 3.6**.**
A minimizer of (P) satisfies .
Proof 3.7**.**
The fact that follows from the marginal constraints and the nonnegativity of . It remains to show that . For a contradiction, assume there is some , then there exists a radius such that on each ball with , but and . In particular, there exist and such that and for , but . We may choose small enough and small enough such that there are and with nonzero Lebesgue measure and with on .
Let now for , , and
[TABLE]
for . Then is feasible. We will now argue that for small enough we have
[TABLE]
where . Note that on .
First, consider . Since is continuous and finite, is finite and hence
[TABLE]
for some constant . Now, consider the entropy of . Since on , we have
[TABLE]
Using the inequality for convex and differentiable , we can estimate
[TABLE]
and similarly for . Again using the above inequality we have
[TABLE]
We obtain
[TABLE]
The right-hand side is of the form with differentiable at [math]. We can therefore estimate
[TABLE]
for some big enough and small .
Combining the estimates for cost and entropy yields
[TABLE]
for small enough. However, the last term will be negative for small enough, which shows that is not optimal in contradiction to the assumption.
Theorem 3.4 shows that the natural setting for the entropically regularized problem (P) is in fact rather than . In the next section, we will prove existence of solutions for a suitable modified dual problem of (P) and justify a pointwise almost everywhere optimality system that can be used as a basis for deriving the Sinkhorn algorithm.
4 Duality in and
In this section, we consider (P) in the space . To derive a dual problem in , we shall perform the variable substitution
[TABLE]
see Fig. 1.
Note that is convex and concave and that the function coincides with the Young function from (2), which is associated with .
We now substitute and , i.e.,
[TABLE]
which conversely implies that and . Using this substitution, we obtain that
[TABLE]
Instead of the predual problem (D), we thus consider the reformulated problem
[TABLE]
This substitution renders the problem nonconvex but, as we will see, allows to prove existence of solutions.
In the following, we assume that – as required for existence for the primal problem – and that . We also recall that the Luxemburg norms and are equivalent norms on and , respectively. Our aim is to apply Tonelli’s direct method to (3) by showing that the functional
[TABLE]
is radially unbounded and lower semi-continuous in the right topology. We first need the following lemma.
Lemma 4.1**.**
If , then
[TABLE]
Proof 4.2**.**
Set for some such that still . Then it holds that \int_{\Omega}\Phi\big{(}\tfrac{v{\mathbb{1}}_{\{v>0\}}}{\gamma_{\epsilon}}\big{)}\,{\mathrm{d}}x>\|{v{\mathbb{1}}_{\{v>0\}}}\|_{\Phi_{\exp}}>\max(1,{\mathcal{L}}(\Omega)).
By Jensen’s inequality we have
[TABLE]
Taking logarithms, we deduce that , and letting yields the claim.
We next capture the invariance inherited from (D) as described in Remark 3.3.
Lemma 4.3**.**
Let , , with . If for an arbitrary we set and , then . In particular, by choosing appropriately, we can always achieve .
Proof 4.4**.**
Note that -a.e. and -a.e. as . By construction, the same holds for and . The first statement is now a direct consequence of the invariance of the cost functional in (D) under the mapping . For the second statement, first note that is continuous in . Moreover,
[TABLE]
so that the assertion follows by the intermediate value theorem.
Remark 4.5**.**
While implies and implies , in general we cannot achieve both equalities simultaneously due to Remark 2.4.
Modulo this invariance we now obtain coercivity.
Lemma 4.6**.**
Let , , be a sequence in such that for all . Then for implies as .
Proof 4.7**.**
Without loss of generality we may assume the to be nonnegative, since replacing with its absolute value decreases without changing . Due to we have and thus
[TABLE]
where denotes the generic equivalence constant for the duality from Proposition 2.8. Analogously we obtain
[TABLE]
Hence for , we have
[TABLE]
Since and is nonnegative, we also have as and therefore
[TABLE]
by Lemma 4.1. Now Lemma 2.6 implies that for and therefore that
[TABLE]
which yields the desired contradiction.
Lemma 4.8**.**
* is sequentially weakly- lower semi-continuous on .*
Proof 4.9**.**
Let in . Then we have in particular in for any . Since is a lower semi-continuous and convex integrand, it thus follows, e.g., by [3, Thm. 13.1.1] that
[TABLE]
and hence that these functionals are weak- sequentially lower semicontinuous on .
It remains to show weak- lower semi-continuity of . For fixed , decompose and into a finite number of subsets with . We further assume that the decompositions and for are obtained from the decompositions for by refinement. Defining , we then have
[TABLE]
Similarly as above, it follows from the lower semi-continuity and convexity of that and are sequentially weakly- lower semi-continuous on and , respectively. Hence
[TABLE]
by the monotone convergence theorem, since monotonically.
Theorem 4.10** (dual existence).**
Problem (3) possesses a maximizer .
Proof 4.11**.**
We show that possesses a minimizer. The energy is finite at, e.g., . We thus may consider a minimizing sequence in , where by Lemma 4.3 we may assume without loss of generality. Lemma 4.6 now implies boundedness of so that by the Banach–Alaoglu theorem we may extract a weakly- convergent subsequence from (recalling that is separable by Lemma 2.9). The claim now follows from the lower semi-continuity of along that subsequence by Lemma 4.8.
From dual solutions and , we obtain by backsubstitution and as a candidate for a solution of the original predual problem (D). However, these are in general not admissible since and does not imply the needed regularity of and : The positive parts of and (which equal the positive parts of and , respectively) are in , but the negative parts need not even be functions as they could be everywhere.
Nevertheless, from (3) one sees that -almost everywhere and -almost everywhere, and hence and are at least - and -measurable, respectively. We will derive more information on and from the necessary optimality conditions.
First, we have again a strong duality result relating (3) to (P).
Proposition 4.12** (strong duality).**
Let , , and . Then, both (P) and (3) admit a solution, and their optimal values coincide.
Proof 4.13**.**
Existence for both problems follows from Theorems 3.4 and 4.10. To show their equality, by Proposition 3.1 it suffices to show that the value of (D) equals that of (3). First, let and be arbitrary and set and . By substitution, we see that
[TABLE]
and taking the supremum over all yields that the value of (D) is at most that of (3).
It thus remains to show that the value of (3) can be achieved by (D). Let be optimal. By the monotone convergence theorem, and also
[TABLE]
Hence can be arbitrarily well approximated by with and . Now let and with in and in . Here we may assume to be uniformly bounded so that (upon restricting to a subsequence) we additionally have and in . Now
[TABLE]
due to the weak- convergence. Finally, as converge in , converges a.e. (after passing to a subsequence). Using uniform boundedness of , the dominated convergence theorem yields
[TABLE]
Having established primal and dual existence, we can now show how the solution of the dual problem can be used to solve the primal problem.
Theorem 4.14** (optimality conditions).**
Let , , and . Then solutions of (3) satisfy
[TABLE]
for -almost every and -almost every . Furthermore, defined by
[TABLE]
is the solution of (P).
Proof 4.15**.**
Let be solutions of the dual problem. We start with deriving the necessary conditions (5). First, note that and (up to a Lebesgue-negligible set) since otherwise . Let now be arbitrary and consider any with on . We next argue that the dual functional given in (4) is directionally differentiable in with respect to its first argument in direction . Since both and are differentiable at , so are the integrands pointwise almost everywhere on . It therefore suffices to show that the pointwise directional derivatives are integrable in order to differentiate under the integral. For the first term in , we have almost everywhere on that
[TABLE]
which is integrable on since and are feasible for (3). An integrable lower bound is obtained similarly using .
For the second term in , the chain rule and differentiability of yields almost everywhere on that
[TABLE]
where the right-hand side is integrable with respect to .
From the dominated convergence theorem, it thus follows that the partial directional derivative of in the first direction is given by
[TABLE]
where we have again used the integrability of the integrand to apply Fubini’s Theorem in order to iterate the double integrals and used and .
By the specific choice of , we have for all sufficiently small. The optimality of thus implies that
[TABLE]
and since was arbitrary on and , we must therefore have that
[TABLE]
Furthermore, since was arbitrary and whenever , this equation even holds for -almost all , which yields (5a). Equation (5b) is derived analogously.
Now we show that defined by (6) is a solution of the primal problem. First note that by construction, is feasible (i.e., is non-negative and has the correct marginals). Since strong duality holds by Proposition 4.12, it thus suffices to show that the primal objective functional evaluated in is equal to the dual optimal objective value (3). To that end, we insert (6) into the objective functional in (P) and obtain (using again the convention that )
[TABLE]
Since and hence , we have that . Furthermore, we have assumed and can thus shift the integrand to allow applying Tonelli’s Theorem in the second and third integral. Inserting (5), the right-hand side now coincides with . Hence strong duality holds for and , and thus the latter is a solution to (P).
Remark 4.16**.**
The optimality system (5) can be used to derive the Sinkhorn algorithm. First, note that one only needs to find and that solve (5a) and (5b); an optimal plan is then obtained from (6). The Sinkhorn method now solves the nonlinear system (5) by alternatingly solving the equations: Given , compute by solving (5a), i.e., setting
[TABLE]
and then solve (5b) with to obtain
[TABLE]
Formulating this iteration directly in and , we obtain the original Sinkhorn algorithm, cf. [20, Sec. 5.3.1].
Remark 4.17**.**
The optimality system (5) also corresponds to the so-called Schrödinger system [14, Eq. (4.12)–(4.13) or (4.14)], i.e., the system of equations which characterizes the solution to the so-called Schrödinger bridge problem (essentially, the most likely transition path of a hot gas between the initial and final gas distribution and ). Existence of solutions to that system was typically shown based on iterative approximation schemes (analogous to but predating the Sinkhorn algorithm; see the discussion in [14]). There are also alternative proofs exploiting the variational nature of the problem; however, these are not as straightforward as identifying (5) as the optimality conditions to an optimization problem which has a solution. In [7], for example, a minimizing sequence for the dual problem (D) is used to construct a sequence of measures of the type (6) that is then shown to converge to a solution of the Schrödinger bridge problem.
Finally, the optimality conditions (5a) and (5b) allow us to conclude which problem is solved by .
Corollary 4.18**.**
Let , , and . Let be a solution of (3). Then and are solutions of
[TABLE]
and the values of (D) and (3) coincide.
Proof 4.19**.**
First, note that the mapping is continuous and thus attains a minimum and a maximum on the (assumed to be) compact set . From the optimality condition (5a), we thus obtain that
[TABLE]
This implies that for some . We thus have
[TABLE]
Since , we deduce that the right-hand side is finite and hence that is integrable with respect to , i.e., . The result for follows analogously. Finally, it follows from a density argument that (D) cannot exceed (3). Indeed, assume there are and with an objective functional value strictly larger than (3). By invoking the monotone convergence theorem as in the proof of Proposition 4.12, we may assume without loss of generality that and are bounded. Defining now and shows that (3) is no smaller than , the desired contradiction.
Remark 4.20**.**
As for (D) and as formalized in Lemma 4.3, solutions to (3) are not unique.
5 -limit
We now turn to -convergence of the regularized problem. Recall from, e.g., [10], that a sequence of functionals on a metric space is said to -converge to a functional , written , if
- (i)
for every sequence with ,
[TABLE] 2. (ii)
for every , there is a sequence with and
[TABLE]
It is a straightforward consequence of this definition that if -converges to and is a minimizer of for every , then every cluster point of the sequence is a minimizer to . Furthermore, -convergence is stable under perturbations by continuous functionals.
Here we aim to approximate optimal transport plans of the unregularized problem for marginals and which are not required to be in , i.e., we allow arbitrary measures as marginals. In this case we cannot use these marginals for the regularized problems as well, since these will admit no solutions by Theorem 3.4. We therefore consider smoothed marginals and in converging to and , respectively, and show that the regularized problem with these marginals -converges to the unregularized problem with the original marginals. The conceptually different case of -convergence for fixed, non-mollified marginals (which then, however, need to be of finite entropy) has been treated in [11, Thm. 2.7]. Our setting with smoothed marginals allows simpler constructions in the -inequality since a given transport plan is merely approximated via mollification. A further difference to [11, Thm. 2.7] is that we work on a compact set instead of and need to couple the smoothing parameter to the regularization parameter to obtain -convergence.
Let be a smooth, compactly supported, nonnegative kernel with unit integral, and for and set
[TABLE]
Since we will smooth the marginals and the transport plans by convolutions, we will need to slightly extend the domains and to avoid boundary effects. Hence, let and be compact supersets of and , respectively, such that
[TABLE]
and which are large enough to contain the supports of and for . (Here and in the following, we assume that the width of the convolution kernels will be small enough.) For a function or measure on , we denote by the extension of to by zero (and analogously for functions and measures on and ). Let be a continuous extension of onto and set
[TABLE]
Using smoothed marginals and coupling and in an appropriate way, we can then show -convergence of to as .
Theorem 5.1**.**
Let , , and be such that
[TABLE]
which is denoted in the following by . Define and . Then it holds that
[TABLE]
with respect to weak- convergence in .
On the other hand, if are chosen such that or , then does not have a finite -limit. More precisely, even for a family of feasible (i.e., with marginals and ) it holds that
[TABLE]
Proof 5.2**.**
For the first statement, we verify the two conditions in the definition of -convergence.
(i):* Let , then since is continuous and bounded. Since , we also have that*
[TABLE]
and thus that
[TABLE]
Finally, the condition on the marginals is continuous with respect to weak- convergence of , , and (note that ).
(ii):* It suffices to consider a recovery sequence for , because the marginal conditions for and can never be satisfied for . If , then the condition holds trivially. Let therefore be finite. We set . Then as well as , . Since by Young’s convolution inequality for some constant and and we have*
[TABLE]
we conclude that
[TABLE]
The right-hand side vanishes for by the assumption on the (coupled) convergences of and . Hence,
[TABLE]
For the second statement, recall from Lemma 2.13 that
[TABLE]
so that . By Lemma 2.6, this immediately yields , which implies
[TABLE]
and thus so that the assertion follows.
The conditions on and are in particular satisfied for for some .
6 Conclusion
In contrast to the original Kantorovich formulation of optimal transport problems, their entropic regularization is well-posed only for marginals with finite entropy. Restricting the regularized problem to such functions and applying Fenchel duality in the space allows deriving primal-dual optimality conditions that can be interpreted pointwise almost everywhere and used to derive a continuous version of the popular Sinkhorn algorithm. For marginals that do not have finite entropy, a combined regularization and smoothing approach leads to a family of well-posed approximations that -converge to the original Kantorovich formulation if the regularization and smoothing parameters are coupled in an appropriate way.
This work can be extended in several directions. For example, we have considered the usual setting where the entropic penalty is taken with respect to Lebesgue density. More general penalties have been considered in a different framework in [26], and other choices (such as the product measure of the marginals) are possible in the approach considered here as well and may lead to well-posedness and duality for a larger class of marginals. Naturally, a challenging but worthwhile issue would be a convergence analysis of the Sinkhorn algorithm in the considered Orlicz spaces and .
Acknowledgments
Dirk Lorenz, Hinrich Mahler, and Benedikt Wirth acknowledge support by the German Research Foundation (DFG) within the priority program “Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization” (SPP 1962) under grant numbers LO 1436/9-1 and WI 4654/1-1. Benedikt Wirth was further supported by the Alfried Krupp Prize for Young University Teachers awarded by the Alfried Krupp von Bohlen und Halbach-Stiftung and by the DFG via Germany’s Excellence Strategy through the Cluster of Excellence “Mathematics Münster: Dynamics – Geometry – Structure” (EXC 2044) at the University of Münster.
The authors would also like to thank the anonymous reviewers for a number of useful comments and suggestions regarding the presentation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. A.Adams, Sobolev Spaces , volume 140 of Pure and Applied Mathematics, Academic Press, Inc., Boston, MA, second edition, 2003.
- 2[2] L.Ambrosio and N.Gigli, A user’s guide to optimal transport, in Modelling and Optimisation of Flows on Networks , Springer, 2013, 1–155, doi:10.1007/978-3-642-32160-3_1 . · doi ↗
- 3[3] H.Attouch, G.Buttazzo, and G.Michaille, Variational Analysis in Sobolev and BV Spaces , volume 6 of MPS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics (SIAM), 2006, doi:10.1137/1.9781611973488 . · doi ↗
- 4[4] J. D.Benamou, G.Carlier, M.Cuturi, L.Nenna, and G.Peyré, Iterative Bregman projections for regularized transportation problems, SIAM Journal on Scientific Computing 37 (2015), A 1111–A 1138, doi:10.1137/141000439 . · doi ↗
- 5[5] C.Bennett and R.Sharpley, Interpolation of Operators , volume 129 of Pure and Applied Mathematics, Academic Press, Inc., Boston, MA, 1988, doi:10.1016/s 0079-8169(13)62909-8 . · doi ↗
- 6[6] R. J.Berman, The Sinkhorn algorithm, parabolic optimal transport and geometric Monge–Ampère equations, 2017.
- 7[7] A.Beurling, An automorphism of product measures, Annals of Mathematics 72 (1960), 189–200, doi:10.2307/1970151 . · doi ↗
- 8[8] J. M.Borwein and A. S.Lewis, Decomposition of multivariate functions, Canadian Journal of Mathematics 44 (1992), 463–482, doi:10.4153/cjm-1992-030-9 . · doi ↗
