Dynamic Programming in Probability Spaces via Optimal Transport

Antonio Terpin; Nicolas Lanzetti; Florian D\"orfler

arXiv:2302.13550·math.OC·April 9, 2024

Dynamic Programming in Probability Spaces via Optimal Transport

Antonio Terpin, Nicolas Lanzetti, Florian D\"orfler

PDF

TL;DR

This paper develops a framework for solving discrete-time optimal control problems in probability spaces by combining dynamic programming on the ground space with optimal transport, enabling decoupled multi-agent control strategies.

Contribution

It introduces a novel approach that links dynamic programming in probability spaces with optimal transport, providing a separation principle for multi-agent control.

Findings

01

Solution of dynamic programming in probability spaces via ground space and optimal transport

02

Decoupling of low-level agent control and fleet-level control

03

Applicable to multi-agent systems with probabilistic states

Abstract

We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces results from two ingredients: (i) the solution of dynamic programming in the "ground space" (i.e., the space on which the probability measures live) and (ii) the solution of an optimal transport problem. From a multi-agent control perspective, a separation principle holds: The "low-level control of the agents of the fleet" (how does one reach the destination?) and "fleet-level control" (who goes where?) are decoupled.

Tables1

Table 1. Table 1 : Parallelism between objects in the ground space and in the measure space.

	ground space	measure space
state	$x_{k} \in X_{k}$	$μ_{k} \in 𝒫 (X_{k})$
reference	$r_{k} \in R_{k}$	$ρ_{k} \in 𝒫 (R_{k})$
state-input distribution	$x \mapsto u_{k} (x_{k}) \in U_{k}$	$λ_{k} \in 𝒫 (X_{k} \times U_{k})$
state-input distribution	$x \mapsto u_{k} (x_{k}) \in U_{k}$	s.t. ${({proj}_{X_{k}}^{X_{k} \times U_{k}})}_{#} λ_{k} = μ_{k}$
dynamics	$x_{k + 1} = f_{k} (x_{k}, u_{k} (x_{k}))$	$μ_{k + 1} = f_{k}_{#} λ_{k}$
cost-to-go	$j_{k}$	$J_{k}$
stage and terminal costs	$g_{k}$ and $g_{N}$	$𝒦 [g_{k}]$ and $𝒦 [g_{N}]$

Equations114

u_{k} : X_{k} \to U_{k} in f g_{N} (x_{N}, r_{N}) + k = 0 \sum N - 1 g_{k} (x_{k}, u_{k} (x_{k}), r_{N}),

u_{k} : X_{k} \to U_{k} in f g_{N} (x_{N}, r_{N}) + k = 0 \sum N - 1 g_{k} (x_{k}, u_{k} (x_{k}), r_{N}),

u_{k} : X_{k} \to U_{k} in f

u_{k} : X_{k} \to U_{k} in f

+ k = 0 \sum N - 1 G_{k} (μ_{k}, u_{k}, ρ_{N}) \int_{X_{N}} \int_{X_{k}} g_{k} (x_{k}, u_{k} (x_{k}), r_{N}) d μ_{k} (x_{k}) d ρ_{N} (r_{N}),

u_{k} : X_{k} \to U_{k} in f k = 0 \sum N C_{k} (μ_{k}, ρ_{N}) + \int_{X_{k}} ∥ u_{k} (x_{k}) ∥^{2} d μ_{k} (x_{k}),

u_{k} : X_{k} \to U_{k} in f k = 0 \sum N C_{k} (μ_{k}, ρ_{N}) + \int_{X_{k}} ∥ u_{k} (x_{k}) ∥^{2} d μ_{k} (x_{k}),

u_{k} : X_{k} \to U_{k} in f i = 1 \sum M g_{N} (x_{N}^{(i)}) + k = 0 \sum N - 1 C_{k} ({x_{k}^{(j)}}_{j = 1}^{M}, ρ_{N}) + g_{k} (x_{k}^{(i)}, u_{k} (x_{k}^{(i)})),

u_{k} : X_{k} \to U_{k} in f i = 1 \sum M g_{N} (x_{N}^{(i)}) + k = 0 \sum N - 1 C_{k} ({x_{k}^{(j)}}_{j = 1}^{M}, ρ_{N}) + g_{k} (x_{k}^{(i)}, u_{k} (x_{k}^{(i)})),

u_{k} : X_{k} \to U_{k} in f \int_{X_{N}} g_{N} (x_{N}) d μ_{N} (x_{N}) + k = 0 \sum N - 1 C_{k} (μ_{k}, ρ_{N}) + \int_{X_{k}} g_{k} (x_{k}, u_{k} (x_{k})) d μ_{k} (x_{k}) .

u_{k} : X_{k} \to U_{k} in f \int_{X_{N}} g_{N} (x_{N}) d μ_{N} (x_{N}) + k = 0 \sum N - 1 C_{k} (μ_{k}, ρ_{N}) + \int_{X_{k}} g_{k} (x_{k}, u_{k} (x_{k})) d μ_{k} (x_{k}) .

u_{k} : X_{k} \to U_{k} in f G_{N} (μ_{N}, ρ_{N}) + k = 0 \sum N - 1 G_{k} (μ_{k}, u_{k}, ρ_{k}),

u_{k} : X_{k} \to U_{k} in f G_{N} (μ_{N}, ρ_{N}) + k = 0 \sum N - 1 G_{k} (μ_{k}, u_{k}, ρ_{k}),

K [c] (μ, ν) : = γ \in Γ (μ, ν) in f \int_{X \times Y} c (x, y) d γ (x, y),

K [c] (μ, ν) : = γ \in Γ (μ, ν) in f \int_{X \times Y} c (x, y) d γ (x, y),

K [c] (μ_{1} \dots, μ_{k}) : = γ \in Γ (μ_{1}, \dots, μ_{k}) in f \int_{X} c (x_{1}, \dots, x_{k}) d γ (x_{1}, \dots, x_{k}),

K [c] (μ_{1} \dots, μ_{k}) : = γ \in Γ (μ_{1}, \dots, μ_{k}) in f \int_{X} c (x_{1}, \dots, x_{k}) d γ (x_{1}, \dots, x_{k}),

\int_{X} c (x_{1}, \dots, x_{k}) d γ^{ε} (x_{1}, \dots, x_{k}) \leq K [c] (μ_{1} \dots, μ_{k}) + ε .

\int_{X} c (x_{1}, \dots, x_{k}) d γ^{ε} (x_{1}, \dots, x_{k}) \leq K [c] (μ_{1} \dots, μ_{k}) + ε .

J (μ, ρ) = μ_{k} \in P (X_{k}) λ_{k} \in P (X_{k} \times U_{k}) in f

J (μ, ρ) = μ_{k} \in P (X_{k}) λ_{k} \in P (X_{k} \times U_{k}) in f

s.t.

(proj_{X_{k}}^{X_{k} \times U_{k}})_{#} λ_{k} = μ_{k} .

J_{k} (μ_{k}, ρ_{k}, \dots, ρ_{N})

J_{k} (μ_{k}, ρ_{k}, \dots, ρ_{N})

j (x, r_{0}, \dots, r_{N}) = x_{k} \in X_{k} u_{k} \in U_{k} in f s.t. g_{N} (x_{N}, r_{N}) + k = 0 \sum N - 1 g_{k} (x_{k}, u_{k}, r_{k}) x_{k + 1} = f_{k} (x_{k}, u_{k}), x_{0} = x .

j (x, r_{0}, \dots, r_{N}) = x_{k} \in X_{k} u_{k} \in U_{k} in f s.t. g_{N} (x_{N}, r_{N}) + k = 0 \sum N - 1 g_{k} (x_{k}, u_{k}, r_{k}) x_{k + 1} = f_{k} (x_{k}, u_{k}), x_{0} = x .

j_{N} (x_{N}, r_{N})

j_{N} (x_{N}, r_{N})

j_{k} (x_{k}, r_{k}, \dots, r_{N})

g_{k} (x_{k}, u_{k}, r_{k}) + j_{k + 1} (f_{k} (x_{k}, u_{k}), r_{k + 1}, \dots, r_{N}) \leq j_{k} (x_{k}, r_{k}, \dots, r_{N}) + ε .

g_{k} (x_{k}, u_{k}, r_{k}) + j_{k + 1} (f_{k} (x_{k}, u_{k}), r_{k + 1}, \dots, r_{N}) \leq j_{k} (x_{k}, r_{k}, \dots, r_{N}) + ε .

J_{k} (μ_{k}, ρ_{k}, \dots, ρ_{N}) = K [j_{k}] (μ_{k}, ρ_{k}, \dots, ρ_{N})

J_{k} (μ_{k}, ρ_{k}, \dots, ρ_{N}) = K [j_{k}] (μ_{k}, ρ_{k}, \dots, ρ_{N})

= γ \in Γ (μ_{k}, ρ_{k}, \dots, ρ_{N}) in f \int_{X_{k} \times R_{k} \times \dots \times R_{N}} j_{k} (x_{k}, r_{k}, \dots, r_{N}) d γ (x_{k}, r_{k}, \dots, r_{N}),

λ_{k}^{ε} = (proj_{X_{k}}^{X_{k} \times R_{k} \times \dots \times R_{N}}, u_{k}^{ε /2})_{#} γ_{k}^{ε /2}

λ_{k}^{ε} = (proj_{X_{k}}^{X_{k} \times R_{k} \times \dots \times R_{N}}, u_{k}^{ε /2})_{#} γ_{k}^{ε /2}

j_{N} (x_{N}, r_{N})

j_{N} (x_{N}, r_{N})

j_{k} (x_{k}, r_{N})

J_{k} (μ_{k}, ρ_{N}) = K [j_{k}] (μ_{k}, ρ_{N}) = γ \in Γ (μ_{k}, ρ_{N}) in f \int_{X_{k} \times R_{N}} j_{k} (x_{k}, r_{N}) d γ (x_{k}, r_{N}),

J_{k} (μ_{k}, ρ_{N}) = K [j_{k}] (μ_{k}, ρ_{N}) = γ \in Γ (μ_{k}, ρ_{N}) in f \int_{X_{k} \times R_{N}} j_{k} (x_{k}, r_{N}) d γ (x_{k}, r_{N}),

λ_{k}^{ε} = (proj_{X_{k}}^{X_{k} \times R_{N}}, u_{k}^{ε /2})_{#} γ_{k}^{ε /2}

λ_{k}^{ε} = (proj_{X_{k}}^{X_{k} \times R_{N}}, u_{k}^{ε /2})_{#} γ_{k}^{ε /2}

J_{k} (μ_{k}, ρ_{N}) = γ \in Γ (μ_{k}, ρ_{N}) min \int_{R^{n} \times R^{n}} \frac{N - k}{N ^{2}} ∥ r_{N} - x_{k} ∥^{2} d γ (x_{k}, r_{N}) = \frac{N - k}{N ^{2}} W_{2} (μ_{k}, ρ_{N})^{2},

J_{k} (μ_{k}, ρ_{N}) = γ \in Γ (μ_{k}, ρ_{N}) min \int_{R^{n} \times R^{n}} \frac{N - k}{N ^{2}} ∥ r_{N} - x_{k} ∥^{2} d γ (x_{k}, r_{N}) = \frac{N - k}{N ^{2}} W_{2} (μ_{k}, ρ_{N})^{2},

J_{0} (μ_{0}, ρ, ρ, ρ) = K [j_{0}] (μ_{0}, ρ, ρ, ρ) \leq 2 \frac{1}{2} j_{0} (x_{0} = \pm 1, r_{0} = \pm 1, r_{1} = \mp 1, r_{2} = \pm 1) = 0,

J_{0} (μ_{0}, ρ, ρ, ρ) = K [j_{0}] (μ_{0}, ρ, ρ, ρ) \leq 2 \frac{1}{2} j_{0} (x_{0} = \pm 1, r_{0} = \pm 1, r_{1} = \mp 1, r_{2} = \pm 1) = 0,

j_{N} (x_{N}, r_{N})

j_{N} (x_{N}, r_{N})

j_{k} (x_{k}, r_{N})

J [c] (μ_{1} \dots, μ_{k})

J [c] (μ_{1} \dots, μ_{k})

J [c \circ (l \times id_{Y})] (μ)

J [c \circ (l \times id_{Y})] (μ)

= (proj_{Z}^{Z \times Y})_{#} μ^{'} \in Γ (l_{#} μ) in f \int_{Z \times Y} c d μ^{'} = J [c] (l_{#} μ) .

\int_{Z \times Y} ϕ (z_{i}) d μ^{'} (z, y)

\int_{Z \times Y} ϕ (z_{i}) d μ^{'} (z, y)

= \int_{X \times Y} ϕ (l_{i} (x_{i})) d μ (x_{1}, \dots, x_{k}, y)

= \int_{X_{i}} ϕ (l_{i} (x_{i})) d ((proj_{X_{i}}^{X \times Y})_{#} μ) (x_{i})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Dynamic Programming in Probability Spaces via Optimal Transport

Antonio Terpin

Nicolas Lanzetti

Florian Dörfler

Abstract

We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces results from two ingredients: (i) the solution of dynamic programming in the “ground space” (i.e., the space on which the probability measures live) and (ii) the solution of an optimal transport problem. From a multi-agent control perspective, a separation principle holds: The “low-level control of the agents of the fleet” (how to reach the destination?) and “fleet-level control” (who goes where?) are decoupled.

\footnotefirstpage

∗ Equal contribution. \footnotefirstpageThis research was supported by the Swiss National Science Foundation under the NCCR Automation, grant agreement 51NF40_180545.

1 Introduction

Many optimal control problems of stochastic or large-scale dynamical systems can be framed in the probability space, whereby the state is a probability measure. We provide three examples, starting with a pedagogical case:

Example 1.1 (Deterministic optimal control).

Consider a discrete-time dynamical system with state space $X_{k}$ , input space $U_{k}$ , and dynamics $f_{k}:X_{k}\times U_{k}\to X_{k+1}$ . The problem of steering the system from an initial state $x_{0}\in X_{0}$ towards a target state $r{}_{N}\in X_{N}$ in $N\in\mathbb{N}$ time-steps while minimizing the terminal cost $g_{N}:X_{N}\times X_{N}\to\bar{\mathbb{R}}_{\geq 0}$ and the stage costs $g_{k}:X_{k}\times U_{k}\times X_{N}\to\bar{\mathbb{R}}_{\geq 0}$ reads

[TABLE]

subject to the dynamics. The costs $g_{k}$ and $g_{N}$ measure the “closeness” between the state $x_{k}$ and the reference $r_{N}$ , as well as the input effort. For instance, when all the spaces are $\mathbb{R}^{n}$ , they may be defined as $g_{k}(x_{k},u_{k},r_{N})=\left\|x_{k}-r_{N}\right\|^{2}+\left\|u_{k}\right\|^{2}$ and $g_{N}(x_{N},r_{N})=\left\|x_{N}-r_{N}\right\|^{2}$ . It is instructive to capture this setting via probability measures. At each time-step $k$ , consider the Dirac’s delta probability measure $\mu_{k}=\delta_{x_{k}}$ , and let $\rho_{N}=\delta_{r{}_{N}}$ . The relation between $\mu_{k+1}$ and $\mu_{k}$ is the “pushforward” operation $\mu_{k+1}=\delta_{x_{k+1}}=\delta_{f_{k}(x_{k},u_{k}(x_{k}))}={f_{k}(\cdot,u_{k}(\cdot))}_{\#}{\mu_{k}}$ , a dynamics in the probability space. Then, an equivalent formulation to (1) is

[TABLE]

where $G_{N}$ and $G_{k}$ have the same meaning as the lower-case counterparts in (1) but are defined in the probability space.

Example 1.2 (Distribution steering).

Assume now that the initial condition $x_{0}$ in Example 1.1 is unknown, but its realization is distributed according to $\mu_{0}\in\mathcal{P}(X_{0})$ , with $\mathcal{P}(X_{0})$ being the space of probability measures over $X_{0}$ . The input to apply to each “particle” having state $x_{k}\in X_{k}$ is given by the (deterministic) feedback map $u_{k}:X_{k}\to U_{k}$ , and the dynamical evolution is $x_{k+1}=f_{k}(x_{k},u_{k}(x_{k}))$ . Similarly to Example 1.1, the dynamics in the probability space are then $\mu_{k+1}={f_{k}(\cdot,u_{k}(\cdot))}_{\#}{\mu_{k}}$ . The same formalism of Example 1.1 can then be used to ensure that the terminal state $x_{N}$ is distributed closely to a desired probability measure $\rho_{N}\in\mathcal{P}(X_{N})$ :

[TABLE]

where $\mathcal{C}_{k}$ measures closeness between $\mu_{k}$ and $\rho_{N}$ , akin to $G_{k}$ and $G_{N}{}$ in (2).

Example 1.3 (Large-scale multi-agent systems).

The optimal steering of a fleet of $M$ identical agents, with dynamics $x_{k+1}^{(i)}=f_{k}(x_{k}^{(i)},u_{k}(x_{k}^{(i)}))$ , from an initial configuration $\{x_{k}^{(i)}\}_{i=1}^{M}$ to a desired one $\rho_{N}$ can be cast as

[TABLE]

where $\mathcal{C}_{k}$ is a fleet-specific cost (e.g., a cohesion or formation cost), and $g_{N}$ and $g_{k}$ are agent-specific costs (e.g., input effort). Often, the interest lies in the macroscopic behavior of the fleet. Hence, it is custom to capture the state of the fleet by a probability measure $\mu_{k}\in\mathcal{P}(X_{k})$ and the input by a map $u_{k}:X_{k}\to U_{k}$ [1, 2]. The optimization problem in (4) can then be written as an optimal control problem, with state $\mu_{k}$ , input $u_{k}$ , and dynamics $\mu_{k+1}={f_{k}(\cdot,u_{k}(\cdot))}_{\#}{\mu_{k}}$ . Overall,

[TABLE]

Such modeling approach suits robotics [3], mobility [4], and social networks [5, 6].

Formally, (2), (3), and (5) are instances of discrete-time finite-horizon optimal control problems in probability spaces:

[TABLE]

subject to the measure dynamics $\mu_{k+1}={f_{k}(\cdot,u_{k}(\cdot))}_{\#}{\mu_{k}}$ , where $\rho_{k}$ are (possibly time-dependent) reference probability measures. In this paper, we consider $G_{k},G_{N}$ as optimal transport discrepancies: An optimal transport discrepancy measures the effort to transport one probability measure onto another when moving a unit of mass from $x$ to $y$ costs $c(x,y)$ ; see Section 2. To solve (6), one possibility is the Dynamic Programming Algorithm (DPA) [7]. However, its deployment poses several analytical and computational challenges. For example, it is unclear which easy-to-verify assumptions ensure the existence of solutions. Moreover, even if a minimizer exists, its computation suffers the infinite dimensionality of the probability space, and the burden of repeated computations of optimal transport discrepancies; see Section 3.

This inevitable complexity prompts us to adopt a different perspective: At least formally, (6) resembles a single optimal transport problem [8, 9], whereby one seeks to transport one probability measure $\mu_{0}$ to a final one $\rho_{N}$ while minimizing some transportation cost. If this formal similarity is made rigorous, we can tackle (6) with the tools of optimal transport theory, which has reached significant maturity in the last years, both theoretically [8, 10, 9] and numerically [11, 12, 13].

1.1 Contributions

We study the optimal control and dynamic programming in probability spaces through the lens of optimal transport theory. Specifically, we show that many optimal control problems in probability spaces can be reformulated and studied as optimal transport problems. Our results reveal a separation principle: The “low-level control of the agents of the fleet” (how to reach the destination?) and “fleet-level control” (who goes where?) are decoupled. We complement our theoretical analysis with various examples and counterexamples, which demonstrate that our conditions cannot be relaxed and expose the pitfalls of heuristic approaches. The proofs of our results rely on novel stability results for the (multi-marginal) optimal transport problem, which are of independent interest.

1.2 Previous work

Most of the literature focuses on continuous time, and it is founded on [14], which relates the optimal transport problem and fluid mechanics. Through the optimal control lens, this formulation corresponds to an optimal control problem with integrator dynamics: The resulting flow is a time-dependent feedback law [15]. An attempt to introduce generic dynamical constraints can be found in [16], where the set of possible flows is constrained in a set of admissible ones, induced by the dynamics. Constructive results can be found in the specific setting of linear systems and Gaussian probability measures. In this case and when the control laws are affine, the space of probability measures is implicitly constrained to the space of Gaussian distributions, and closed-form solutions exist [17, 18, 19, 20]. All of these works build on traditional optimal control tools. In [21, 22, 23], instead, the authors develop a Pontryagin Maximum Principle for optimal control problems in the Wasserstein space (i.e., probability space endowed with the Wasserstein distance). Their analysis combines classical tools in optimal control theory with the “differential structure” of the Wasserstein space [8, 24]. In [25], the authors study optimal transport when the transportation cost results from the cost-to-go of a Linear Quadratic Regulator (LQR). This methodology implicitly assumes that, to steer a fleet of identical particles, one can compute the cost-to-go for the single particle and then “lift” the solution to the probability space via an optimal transport problem. While attractive, this approach generally yields sub-optimal solutions; see Section 5.

The discrete-time setting has, instead, received less attention. Towards this direction, [26, 27, 28, 29, 30] explore the covariance control problem for discrete-time linear systems, possibly subject to constraints. In [2], the authors study the optimal steering of multiple agents from an initial configuration to a final one in a distributed fashion. In [1], the authors follow an approach similar to [25], albeit in discrete time. Finally, in [31], the authors study the problem of mass transportation over a graph, embedding constraints such as the maximum flow on the edges. To do so, they exploit the structure of the ground space; in this case, the transportation graph. In all these approaches, the distribution/fleet steering problem is a-priori formalized as an optimal transport problem and not as an optimal control problem in the probability space. As we shall see in Section 4, our results bridge these two perspectives and allow us to back up and recover many of the approaches in the literature.

1.3 Organization

The paper unfolds as follows. In Sections 2 and 3, we review the space of probability measures and introduce our problem setting. We present and discuss our main result, Theorem 4.2, in Section 4. In Section 5, we provide examples to ignite an intuition on our results and expose potential pitfalls. All proofs are in Section 6. Finally, Section 7 summarizes our findings and the future directions.

1.4 Notation

We denote by $C_{b}(X)$ the space of continuous and bounded functions $X\to\mathbb{R}$ , by $\operatorname{lsc}(X,Y)$ the space of lower semi-continuous functions $X\to Y$ , and by $\bar{\mathbb{R}}_{\geq 0}=[0,+\infty]$ the set of non-negative extended real numbers. The identity map on $X$ is denoted by $\operatorname{id}_{X}$ , and the projection maps from $X\times Y$ onto $X$ are denoted by $\operatorname{proj}^{X\times Y}_{X}$ . Given the set of maps $\{h_{k}:X\to X_{k}\}_{k=i}^{j}$ we denote by $(h_{i},\ldots,h_{j}):X\to X_{i}\times\ldots\times X_{j}$ the map $x\mapsto(h_{i},\ldots,h_{j})(x)\coloneqq(h_{i}(x),\ldots,h_{j}(x)).$

2 The Space of Probability Measures

We start with notation and preliminaries in Section 2.1. Then, in Section 2.2, we review optimal transport.

2.1 Preliminaries

We assume all spaces to be Polish spaces, and all probability measures and maps to be Borel. We denote by $\mathcal{P}(X)$ the space of Borel probability measures on $X$ , and we denote by $\delta_{x}$ the Dirac’s delta at $x\in X$ ; i.e., the probability measure defined for all Borel sets $B\subseteq X$ as $\delta_{x}(B)=1$ if $x\in B$ and $\delta_{x}(B)=0$ otherwise. We denote by $\operatorname{supp}(\mu)$ the support of a probability measure $\mu\in\mathcal{P}(X)$ . The pushforward of a probability measure $\mu\in\mathcal{P}(X)$ through $T:X\to Y$ , denoted by ${T}_{\#}{\mu}\in\mathcal{P}(Y)$ , is defined by $({T}_{\#}{\mu})(A)=\mu(T^{-1}(A))$ for all Borel sets $A\subseteq Y$ . For any ${T}_{\#}{\mu}$ -integrable $\phi:Y\to\mathbb{R}$ it holds $\int_{Y}\phi\,\mathrm{d}({T}_{\#}{\mu})=\int_{X}\phi\circ T\,\mathrm{d}\mu.$ Given $\nu\in\mathcal{P}(Y)$ , $T$ is a transport map from $\mu$ to $\nu$ if ${T}_{\#}{\mu}=\nu$ ; to this extent, it suffices that for all $\phi\in C_{b}(Y)$ $\int_{Y}\phi\,\mathrm{d}\nu=\int_{Y}\phi\,\mathrm{d}({T}_{\#}{\mu}).$ We say that $T:X\to Y\times Z$ is a transport map from $\mu\in\mathcal{P}(X)$ to $(\nu_{1},\nu_{2})\in\mathcal{P}(Y)\times\mathcal{P}(Z)$ if ${(\operatorname{proj}^{Y\times Z}_{Y})}_{\#}{({T}_{\#}{\mu})}=\nu_{1}$ and ${(\operatorname{proj}^{Y\times Z}_{Z})}_{\#}{({T}_{\#}{\mu})}=\nu_{2}$ .

2.2 Optimal transport

Given a non-negative transportation cost $c:X\times Y\to\bar{\mathbb{R}}_{\geq 0}$ , the optimal transport discrepancy $\mathcal{K}[c]:\mathcal{P}(X)\times\mathcal{P}(Y)\to\bar{\mathbb{R}}_{\geq 0}$ between two probability measures $\mu\in\mathcal{P}(X)$ and $\nu\in\mathcal{P}(Y)$ is

[TABLE]

where $\Gamma(\mu,\nu)\coloneqq\left\{\bm{\gamma}\in\mathcal{P}(X\times Y)\,|\,{(\operatorname{proj}^{X\times Y}_{X})}_{\#}{\bm{\gamma}}=\mu,{(\operatorname{proj}^{X\times Y}_{Y})}_{\#}{\bm{\gamma}}=\nu\right\}$ is the set of couplings. A prominent example of optimal transport discrepancy is, for some $p\geq 1$ , the ${p}^{\mathrm{th}}$ power of the $p$ -Wasserstein distance, obtained when $X=Y$ and the transportation cost $c$ is a metric that induces the topology on $X$ [8, §7].

Remark 2.1.

When the transportation cost does not depend on one of the two variables (e.g., there exists $\tilde{c}\in X\to\bar{\mathbb{R}}_{\geq 0}$ such that $c(x,y)=\tilde{c}(x)$ ), the optimal transport discrepancy reduces to an expected value; i.e., $\mathcal{K}[c](\mu,\nu)=\mathbb{E}^{\mu}\left[\tilde{c}\right]$ .

We will repeatedly work with a generalization of the optimal transport problem to $k$ marginals. Let $X\coloneqq X_{1}\times\ldots\times X_{k}$ , and $c:X\to\bar{\mathbb{R}}_{\geq 0}$ . The multi-marginal optimal transport problem between $k$ probability measures $\{\mu_{i}\in\mathcal{P}(X_{i})\}_{i=1}^{k}$ reads

[TABLE]

where $\Gamma(\mu_{1},\ldots,\mu_{k})\coloneqq\left\{\bm{\gamma}\in\mathcal{P}(X)\,|\,{(\operatorname{proj}^{X}_{X_{i}})}_{\#}{\bm{\gamma}}=\mu_{i},\,i\in\{1,\ldots,k\}\right\}.$ In general, the infima in (7) and (8) are not attained, unless mild conditions on the transportation cost hold true. For instance, $c\in\operatorname{lsc}(X\times Y,\bar{\mathbb{R}}_{\geq 0})$ in (7); see [9, §4]. A transport plan $\bm{\gamma}^{\varepsilon}\in\Gamma(\mu_{1},\ldots,\mu_{k})$ is $\varepsilon$ -optimal when

[TABLE]

The formulation in (7) and (8) is the Kantorovich formulation of the optimal transport problem, whereby one optimizes over transport plans $\bm{\gamma}$ . The (stricter) Monge formulation111Historically, the Monge formulation comes first. For a thorough review of the history of optimal transport and its founding fathers, see [9, §1]. considers only transport plans $\bm{\gamma}={(\operatorname{id}_{X_{1}},T)}_{\#}{\mu_{1}}\in\Gamma(\mu_{1},\ldots,\mu_{k})$ induced by a transport map $T:X_{1}\to X_{2}\times\ldots X_{k},{T}_{\#}{\mu_{1}}=(\mu_{2},\ldots,\mu_{k})$ .

3 Problem Statement

Let $X_{k}$ , $U_{k}$ , and $R_{k}$ be Polish spaces, representing the state space, the input space, and the space of references in the ground space, respectively (often, $R_{k}=X_{k}$ ). We consider dynamical systems whose state is a probability measure over $X_{k}$ . This approach encompasses continuous approximations of multi-agent systems and systems with uncertain initial conditions (usually captured by absolutely continuous probability measures), as well as finite settings (captured by empirical probability measures). For instance,

Example 3.1 (Robots in a grid).

Consider $M$ robots in a grid of three cells; i.e., $X_{k}=\{\pm 1,0\}$ . Suppose that the ${i}^{\mathrm{th}}$ robot is located at $x_{k}^{(i)}\in X_{k}$ (i.e., has state $x_{k}^{(i)}$ ). Then, the state of the system is $\mu_{k}=\frac{1}{M}\sum_{i=1}^{M}\delta_{x_{k}^{(i)}}$ . The same modeling approach applies to $M$ robots in the two-dimensional plane, simply setting $X_{k}=\mathbb{R}^{2}$ .

In this setting, we focus on the following optimal control problem:

Problem 3.2 (Discrete-time optimal control in probability spaces).

Let $N\in\mathbb{N}_{\geq 1}$ . For dynamics $f_{k}:X_{k}\times U_{k}\to X_{k+1}$ , costs $g_{k}:X_{k}\times U_{k}\times R_{k}\to\bar{\mathbb{R}}_{\geq 0}$ and $g_{N}{}:X_{N}\times R_{N}\to\bar{\mathbb{R}}_{\geq 0}$ , initial condition $\mu\in\mathcal{P}(X_{0})$ , and reference trajectory $\rho=(\rho_{0},\ldots,\rho_{N})\in\mathcal{P}(R_{0})\times\ldots\times\mathcal{P}(R_{N})$ , find the joint state-input distribution $\lambda_{k}\in\mathcal{P}(X_{k}\times U_{k})$ which solve

[TABLE]

Before presenting our results, we detail our setting. The notation in the ground space is juxtaposed with the one in the measure space in Table 1.

3.1 State-input distribution

The state-input distribution $\lambda_{k}\in\mathcal{P}(X_{k}\times U_{k})$ is a probability measure on $X_{k}\times U_{k}$ whose first marginal is $\mu_{k}$ . The semantics is as follows: The probability mass assigned by $\lambda_{k}$ to the pair $(x_{k},u_{k})$ indicates the probability that one particle has state $x_{k}\in X_{k}$ and applies the input $u_{k}\in U_{k}$ or, equivalently, the share of agents which have state $x_{k}\in X_{k}$ and apply the input $u_{k}\in U_{k}$ . When $\lambda_{k}={(\operatorname{id}_{X},u_{k})}_{\#}{\mu_{k}}$ for some $u_{k}:X_{k}\to U_{k}$ , the input is “deterministic”: All particles that have state $x_{k}\in X$ apply the input $u_{k}(x_{k})\in U_{k}$ .

Example 3.3 (Robots in a grid, continued).

Consider again $M$ identical robots on $X_{k}=\{\pm 1,0\}$ , where at each time step each robot can either move to the origin ( $u_{k}=0$ ) and stay there forever, or change position ( $u_{k}=-1$ ), so that $U_{k}=\{0,1\}$ and $f_{k}(x_{k},u_{k})=x_{k}u_{k}$ . Consider the following input-state distributions $\lambda^{(1)}_{k}$ and $\lambda^{(2)}_{k}$ .

[TABLE]

In the first case (i.e., $\lambda^{(1)}_{k}$ ), $20\%$ of the robots are located at $x_{k}=-1$ and go to the origin ( $u_{k}=0$ ), $30\%$ of the robots are located at $x_{k}=-1$ and switch position ( $u_{k}=-1$ ), and $50\%$ of the robots are located at $x_{k}=0$ and remain there ( $u_{k}=0$ , despite irrelevant for the dynamics). The input is not deterministic, since not all robots located at $x_{k}=-1$ apply the same input. From $\lambda^{(1)}_{k}$ we can also infer the distribution of the robots: $50\%$ of them are located at $x_{k}=-1$ , and the other $50\%$ at $x_{k}=0$ . In the second case (i.e., $\lambda^{(2)}_{k}$ ), the input is deterministic: All robots located at $x_{k}=-1$ switch position, and all the robots located at $x_{k}=0$ stay there.

Remark 3.4.

Two comments on our modeling choice. First, since the first marginal of $\lambda_{k}$ is $\mu_{k}$ , the costs $\mathcal{K}[g_{k}](\lambda_{k},\rho_{k})$ are implicitly a function of the state, the input, and the reference trajectory. Second, in many instances (including multi-agent settings with finitely many agents), optimal inputs turn out to be deterministic (in the sense outlined above). Yet, the more general joint state-input distribution considerably simplify the analysis, the same way the Kantorovich formulation is more tractable than the Monge formulation in optimal transport theory.

3.2 Dynamics

We consider measure dynamics resulting from the pushforward via a function $f_{k}:X_{k}\times U_{k}\to X_{k+1}$ (typically, the dynamics of the single particles); i.e., $\mu_{k+1}={f_{k}}_{\#}{\lambda_{k}}$ . In the special case of deterministic inputs (i.e., $\lambda_{k}={(\operatorname{id}_{X},u_{k})}_{\#}{\mu_{k}}$ for some function $u_{k}:X_{k}\to U_{k}$ ), the dynamics simplifies to $\mu_{k+1}={f_{k}(\cdot,u_{k}(\cdot))}_{\#}{\mu_{k}}$ .

Example 3.5 (Robots in a grid, continued).

Consider the setting of Example 3.3, where $f_{k}(x_{k},u_{k})=x_{k}u_{k}$ . The measure dynamics are $\mu_{k+1}={f_{k}}_{\#}{\lambda_{k}}$ , and the two inputs of Example 3.3 yield $\mu_{k+1}^{(1)}=0.7\delta_{0}+0.3\delta_{1}$ and to $\mu_{k+1}^{(2)}=0.5\delta_{0}+0.5\delta_{1}$ .

3.3 Cost

We consider optimal transport discrepancies with, as transportation costs, $g_{k}:X_{k}\times U_{k}\times R_{k}\to\bar{\mathbb{R}}_{\geq 0}$ (stage cost) and $g_{N}{}:X_{N}\times R_{N}\to\bar{\mathbb{R}}_{\geq 0}$ (terminal cost). By Remark 2.1, this modeling assumption includes expected values but not functionals such as the variance of the probability measure or the Kullback-Leibner divergence from the references $\rho_{k}$ , $\rho_{N}$ . Our formulation encompasses the terminal constraint $\mu_{N}=\rho_{N}$ : It suffices to set $g_{N}(x_{N},r_{N})=+\infty$ if $x_{N}\neq r_{N}$ . Similarly, state-dependent input constraints $U_{k}(x_{k})$ can be encoded setting $g_{k}(x_{k},u_{k},r_{k})=+\infty$ when $u_{k}\not\in U_{k}(x_{k})$ . In view of Example 1.1, the transportation costs $g_{k}$ and $g_{N}{}$ may be interpreted as the cost incurred by a single agent.

Example 3.6 (Robots in a grid, continued).

Suppose that the goal is to steer $\frac{M}{2}$ robots to $x_{k}=-1$ and $\frac{M}{2}$ to $x_{k}=+1$ , while minimizing the input. Then, $\rho_{N}=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{+1}$ and, for some weight $\alpha>0$ , possible costs are $g_{N}{}(x_{N},r_{N})=\absolutevalue{x_{N}-r_{N}}$ and $g_{k}(x_{k},v_{k},r_{k})=\alpha|v_{k}|$ . This way, the aim is to minimize the (type 1) Wasserstein distance from the reference $\rho_{N}$ at the end of the horizon (i.e., $\mathcal{K}[g_{N}](\mu_{N},\rho_{N})$ ) and the (weighted) input effort throughout the horizon (i.e., $\mathcal{K}[g_{k}](\lambda_{k},\rho_{k})=\alpha\mathbb{E}^{\lambda_{k}}\left[|v_{k}|\right]$ ). The weight $\alpha>0$ arbitrates between these two objectives. The references $\rho_{k}$ for $k\in\{0,\ldots,N-1\}$ do not enter in the cost and are therefore irrelevant.

3.4 DPA for 3.2

3.2 is an instance of the classic discrete-time finite-horizon optimal control problem in abstract spaces [7, 32]. It is therefore natural to tackle it via the DPA:

Definition 3.7 (DPA).

*Initialization: Let $J_{N}(\mu_{N},\rho_{N})\coloneqq\mathcal{K}[g_{N}](\mu_{N},\rho_{N})$ .

Recursion: For all $k\in\{N-1,N-2,\ldots,1,0\}$ , compute the cost-to-go $J_{k}$ :*

[TABLE]

Unfortunately, the DPA in probability spaces poses several analytic and computational challenges; we mention two. First, it is unclear under which easy-to-verify assumptions minimizers exist. Second, even if they do, their computation remains challenging, if not prohibitive. Already when all sets are finite, and the (generally infinite-dimensional) probability space reduces to the finite-dimensional probability simplex, (10) is excruciating. For instance, when $G_{N}$ is an optimal transport discrepancy, the mere evaluation of the cost-to-go $J_{N}$ involves solving an optimal transport problem, with all the related computational difficulties [11, 33, 12, 34]. Thus, the optimization of $J_{N}$ , needed to compute $J_{N-1}$ , will inevitably be very demanding.

In the following, we show that the solution of 3.2 can be constructed from the solution of the DPA in the ground space (i.e., $X_{0},X_{1},\ldots$ ) and a single (possibly multi-marginal) optimal transport problem. In other words, a separation principle holds: The optimal control law results from the combination of optimal low-level control laws (found via DPA in the ground space) and a fleet-level control law (found via an optimal transport problem). This way, we bypass the cumbersome application of DPA in probability spaces as well as the repeated evaluation of optimal transport discrepancies. At least formally, our result generalizes two well-known extreme cases. On the one hand, when considering Dirac’s delta probability measures, the DPA in the probability space trivially reduces to the DPA in the ground space (see Example 1.1); on the other hand, when considering trivial dynamics (i.e., $N=1$ and $f_{0}(x_{0},u_{0})=x_{0}$ ) and an optimal transport discrepancy as a terminal cost, 3.2 reduces to an optimal transport problem. Thus, DPA in probability spaces should be at least “as difficult as” solving both the DPA in the ground space and an optimal transport problem. As we shall see below, it is not “more difficult” than that.

3.5 Auxiliary problem: DPA in the ground space

Before presenting our main results, we introduce an auxiliary optimal control problem in the ground space:

[TABLE]

Similarly to (10), the DPA provides the cost-to-go $j_{k}:X_{k}\times R_{k}\times\ldots\times R_{N}\to\bar{\mathbb{R}}_{\geq 0}$ :

[TABLE]

Specifically, we use lower-case $j_{k}$ for the cost-to-go in the ground space and capital-case $J_{k}$ for the probability space twin. By (11), ( $\varepsilon$ -)optimal inputs will be feedback law $u_{k}:X_{k}\times R_{k}\times\ldots\times R_{N}\to U_{k}$ . In particular, an input $u_{k}\in U_{k}$ (or, with slight abuse of notation, a feedback law $u_{k}:X_{k}\times R_{k}\times\ldots\times R_{N}\to U_{k}$ ) is $\varepsilon$ -optimal in (11) if

[TABLE]

4 Main Result

In this section, we present our main result. We first provide an informal statement in Section 4.1. The rigorous version is in Section 4.2.

4.1 A separation principle in the probability space

Our main result predicates a separation principle:

Theorem 4.1 (DPA in probability spaces via optimal transport, informal).

Consider the setting of 3.2. At every stage $k$ :

(i)

The cost-to-go $J_{k}$ is a multi-marginal optimal transport problem between the current state $\mu_{k}$ and the future references $\rho_{k},\ldots,\rho_{N}$ , with transportation cost being the cost-to-go in the ground space $j_{k}$ . 2. (ii)

The optimal state-input distribution $\lambda^{\ast}_{k}$ results from the following strategy:

(1)

Find the optimal input $u_{k}^{\ast}$ in the ground space; 2. (2)

Find the optimal transport plan $\bm{\gamma}_{k}^{\ast}$ for the cost-to-go $J_{k}$ ; 3. (3)

Dispatch the particles as prescribed by $\bm{\gamma}_{k}^{\ast}$ and apply $u_{k}^{\ast}$ to steer them to their allocated trajectory.

In words, to solve DPA in probability spaces, we first solve for the cost-to-go $j_{k}$ in the ground space and then construct a multi-marginal optimal transport problem with transportation cost $j_{k}$ . Moreover, the optimal input for a fleet of identical agents results from the composition of the optimal control strategy for each individual agent (how to optimally follow the trajectory $r_{k},\ldots,r_{N}$ for an agent with state $x_{k}$ ?) and the solution of a multi-marginal optimal transport problem (who has state $x_{k}$ and follows the trajectory $r_{k},\ldots,r_{N}$ ?). Importantly, our result reveals a separation principle: It is optimal to first devise low-level controllers for individual agents (i.e., $u_{k}^{\ast}$ ) and then solve an assignment problem to allocate agents to their destinations (i.e., $\bm{\gamma}_{k}^{\ast}$ ).

4.2 A rigorous statement

Next, we rigorously formalize Theorem 4.1:

Theorem 4.2 (DPA in probability spaces via optimal transport).

Consider the setting of 3.2. At every stage $k$ :

(i)

The cost-to-go equals the multi-marginal optimal transport discrepancy

[TABLE]

where $j_{k}$ is the cost-to-go in the ground space, as in (11). Moreover, the DPA yields the optimal solution $J=J_{0}$ . 2. (ii)

For $\varepsilon\geq 0$ , suppose $u_{k}^{\varepsilon/2}:X_{k}\times R_{k}\times\ldots\times R_{N}\to U_{k}$ and $\bm{\gamma}^{\varepsilon/2}_{k}\in\Gamma(\mu_{k},\rho_{k},\ldots,\rho_{N})$ are $\frac{\varepsilon}{2}$ -optimal in (11) and (13), respectively. Then,

[TABLE]

is an $\varepsilon$ -optimal state-input distribution. If $\varepsilon=0$ , then $\lambda^{\ast}_{k}\coloneqq\lambda^{\varepsilon}_{k}$ is optimal. 3. (iii)

If $\bm{\gamma}^{\varepsilon/2}_{k}$ in (ii) is induced by a transport map $T^{\varepsilon/2}_{k}:X_{k}\to R_{k}\times\ldots\times R_{N}$ , the $\varepsilon$ -optimal control input reads $\lambda^{\varepsilon}_{k}={(\operatorname{id}_{X_{k}},u_{k}^{\varepsilon/2}\circ(\operatorname{id}_{X_{k}},T^{\varepsilon/2}_{k}))}_{\#}{\mu_{k}}$ .

Before discussing Theorem 4.2 and its implications, we consider the special case when the stage costs $g_{k}$ do not depend on the reference; i.e., $g_{k}:X_{k}\times U_{k}\to\bar{\mathbb{R}}_{\geq 0}$ . For instance, any shortest path problem on a graph can be converted into a finite-horizon optimal control problem (e.g., see [32]), where the weights of the edges determine the stage costs $g_{k}$ ; these depend only on the pair $(x_{k},u_{k})$ . In these cases, the DPA reads

[TABLE]

Accordingly, the ground space $\frac{\varepsilon}{2}$ -optimal input are of the form $u_{k}^{\varepsilon/2}:X_{k}\times R_{N}\to U_{k}$ and the cost-to-go $J_{k}$ simplifies to a two-marginals optimal transport discrepancy:

Corollary 4.3 (When two marginals are all you need).

Consider the setting of Theorem 4.2, with $g_{k}:X_{k}\times U_{k}\to\bar{\mathbb{R}}_{\geq 0}$ . At every stage $k$ :

(i)

The cost-to-go equals the optimal transport discrepancy

[TABLE]

where $j_{k}$ is the cost-to-go in the ground space, as in (15). Moreover, the DPA yields the optimal solution $J=J_{0}$ . 2. (ii)

For $\varepsilon\geq 0$ , suppose $u_{k}^{\varepsilon/2}:X_{k}\times R_{N}\to U_{k}$ and $\bm{\gamma}^{\varepsilon/2}_{k}\in\Gamma(\mu_{k},\rho_{N})$ are $\frac{\varepsilon}{2}$ -optimal in (15) and (13), respectively. Then,

[TABLE]

is an $\varepsilon$ -optimal state-input distribution. If $\varepsilon=0$ , then $\lambda^{\ast}_{k}\coloneqq\lambda^{\varepsilon}_{k}$ is optimal. 3. (iii)

If $\bm{\gamma}^{\varepsilon}_{k}$ in (ii) is induced by a transport map $T^{\varepsilon/2}_{k}:X_{k}\to R_{N}$ , the $\varepsilon$ -optimal control input reads $\lambda^{\varepsilon}_{k}={(\operatorname{id}_{X_{k}},u_{k}^{\varepsilon/2}\circ(\operatorname{id}_{X_{k}},T^{\varepsilon/2}_{k}))}_{\#}{\mu_{k}}$ .

We defer the proofs of these results to Section 6.

Discussion

A few comments on our results are in order.

How to construct optimal state-input distributions?

We start with more details on (14) and (17). For simplicity, assume that an optimal input map $u_{k}^{\ast}$ and an optimal transport plan $\bm{\gamma}^{\ast}_{k}$ exist (else, resort to an $\varepsilon$ argument). Then, (ii) in Theorems 4.2 and 4.3 predicate that an optimal state-input distribution $\lambda^{\ast}_{k}$ for 3.2 results from the DPA in the ground space (i.e., $u_{k}^{\ast}$ ) and the solution of an optimal transport problem (i.e., $\bm{\gamma}^{\ast}_{k}$ ):

(i)

Optimal particles allocation: The transport plan $\bm{\gamma}^{\ast}_{k}\in\Gamma(\mu_{k},\rho_{k},\ldots,\rho_{N})$ describes the optimal allocation of the particles throughout the horizon. In discrete instances, $\bm{\gamma}^{\ast}_{k}(x_{k},r_{k},\ldots,r_{N})$ quantifies the share of agents with state $x_{k}$ that will follow the reference trajectory $r_{k},\ldots,r_{N}$ . 2. (ii)

Optimal input coupling: Accordingly, we can interpret $\lambda^{\ast}_{k}$ as the number of particles at $x_{k}$ that apply the optimal input $u_{k}^{\ast}(x_{k},r_{k},\ldots,r_{N})$ . Intuitively, $\lambda^{\ast}_{k}$ assigns probability mass to $(x_{k},u_{k})$ if there is a trajectory $r_{k},r_{k+1},\ldots,r_{N}$ to which $x_{k}$ has been allocated by $\bm{\gamma}^{\ast}_{k}$ , such that $u_{k}^{\ast}$ is the optimal input to minimize the cost along that trajectory.

Existence of optimal solutions

In turn, our results provide sufficient conditions for the existence of an optimal solution for 3.2: existence of a solution for both the DPA in the ground space and the associated optimal transport problem.

Existence of optimal input maps

An optimal solution to Equation 11 always exists when all spaces are finite, or when for any $K\subseteq X_{k}\times R_{k}\times\ldots\times R_{N}$ compact and $L>0$ , the sets $\{u_{k}\in U_{k}\,|\,g_{k}(x_{k},u_{k},r_{k})+j_{k+1}(f_{k}(x_{k},u_{k}),r_{k+1},\ldots,r_{N})\leq L,\forall(x_{k},y)\in K\}$ are compact, the maps $g_{k},g_{N}$ are lower semicontinuous and $f_{k}(x_{k},\cdot)$ are continuous for all $x_{k}\in X_{k}$ ; see [7, Proposition 4.2.2] and [35, Theorem 18.19]. In general, however, optimal input may not exist. For this reason, we state our results using $\varepsilon$ -optimality.

Existence of optimal transport maps

If the solution of the optimal transport problem is a transport map, then (iii) in Theorem 4.2 suggests that the optimal input is deterministic. Without aims of completeness, this is the case when:

(i)

The marginals are empirical with the same number of particles (in virtue of Birkhoff theorem [8, Theorem 6.0.1]); or 2. (ii)

The cost-to-go $j_{k}$ is continuous and semi-concave, and for each $x_{k}\in X_{k}$ the map $(r_{k},\ldots,r_{N})\mapsto\frac{\partial{j_{k}}}{\partial x_{k}}(x_{k},r_{k},\ldots,r_{N})$ is injective in its domain of definition intersected with splitting sets [36, Definition 2.4], and $\mu_{k}$ is absolutely continuous [36, 37] (see [38, Theorem 1.2] for the case with two marginals).

Connections to previous work

The approach in the literature for distribution/fleet steering is fundamentally different from ours: It is a-priori stipulated that the steering problem is an optimal transport problem from an initial distribution to a target one, without formulating an optimal control problem in probability spaces. This way, the complexity of DPA probability spaces is bypassed, at the price, however, of potentially suboptimal solutions: There is no reason for this approach to be optimal for a corresponding control problem in the probability space. With Theorems 4.2 and 4.3, we show that, provided the transportation cost is judiciously chosen, this approach is optimal, and yields the same solution as DPA in probability spaces. For instance, the results in [1, §A] correspond to the optimal strategy when $g_{k}(x_{k},u_{k},r_{k})=\left\|u_{k}\right\|^{2}$ , and terminal constraint on the final distribution (see Section 3.3). The results in [1] can thus be extended to more general terminal costs (e.g., $g_{N}(x_{N},r_{N})=\left\|x_{N}-r_{N}\right\|^{2}$ ). Instead, the results in [1, §B] are suboptimal in the sense of DPA in the probability space. By Theorem 4.2, when the stage costs are reference-dependent (e.g, $g_{k}(x_{k},u_{k},r_{k})=\left\|u_{k}\right\|^{2}+\left\|x_{k}-r_{k}\right\|^{2}$ ), the cost-to-go results from a multi-marginal optimal transport problem. As such, the strategy proposed in [1] does not minimize, at every time-step $k$ , the weighted sum of the squared Wasserstein distance from the target configuration and the input effort. Similarly, the problem formulation in [2] can be recovered with integrator dynamics $f_{k}(x_{k},u_{k})=x_{k}+u_{k}$ , cost $g_{k}(x_{k},u_{k},r_{k})=\left\|u_{k}\right\|^{2}$ , and terminal constraint on the final distribution (see Section 3.3). With a state augmentation (the input used along the trajectory, an independent integrator dynamics) and input constraints as suggested in Section 3.3, [30, Problem 2] is a special case of our setting, with linear dynamics (see Section 3.2), stage cost $g_{k}\equiv 0$ , and terminal cost the squared Wasserstein distance; i.e., $g_{N}(x_{N},r_{N})=\left\|x_{N}-r_{N}\right\|^{2}$ . Simple calculations reveal that the hard-constrained covariance formulation in [30, Problem 1] can be reformulated via a hard terminal constraint on the final probability measure (a Gaussian probability measure with appropriate covariance). In both cases, such specializations are possible because the authors restrict themselves to the Gaussian and linear setting. In general, covariance constraints or penalties require further study; see Section 3.3. Similarly, noisy settings do not immediately benefit from our reformulation; see Section 5.3. Analogous considerations hold for [26, 27, 28, 29].

Computational attractiveness

A rough time complexity analysis in the setting of Corollary 4.3 highlights why our result is computationally attractive. Consider the finite setting: Let $|X|$ be the number of states in the ground space, $N$ the horizon length, $|U|$ the number of available actions at each state, and restrict the attention to empirical probability measures consisting of $M$ particles, which can be written as $\mu_{k}=\frac{1}{M}\sum_{i=1}^{M}\delta_{x^{(i)}_{k}}$ , for $x^{(i)}_{k}\in X,i\in\{1,\ldots,M\}$ . The number of possible states in the probability space amounts to ${{M+|X|-1}\choose M}$ . Therefore, the time complexity of naively applying DPA in the probability space (i.e., to compute the input at the current state) is $\mathcal{O}({{M+|X|-1}\choose M}N|X||U|)$ . On the other hand, the DPA in the ground space (for all initial and terminal states) costs $\mathcal{O}(N|X||U|)$ . An optimal transport problem boils down to solving a linear program with $M^{2}$ decision variables, which has complexity $\mathcal{O}(M^{5})$ (see [39] for a sharper analysis). The total complexity of the recipe provided in Corollary 4.3 is thus $\mathcal{O}(N|X||U|+M^{5})$ , which improves over the DPA in the probability space.

Design of transportation costs

In many disciplines, the design of transportation costs is challenging; e.g., [40, 41]. For instance, in [40], the underlying Riemannian metric characterizing the trajectory of single-cell RNA is retrieved in a data-driven fashion. Theorems 4.2 and 4.3 suggest an alternative approach: First, “learn” the cost-to-go for single particles; then, use it as the transportation cost.

Measurability issues

In general, the cost-to-go $j_{k}$ is not Borel. Nonetheless, with our assumptions, it is lower semi-analytic [42, Corollary 8.2.1] and, thus, the integral in (13) is well-defined [42, §7.7]. Similarly, for any $\varepsilon>0$ , the inputs $u_{k}^{\varepsilon/2}$ may fail to be Borel measurable but are only universally measurable [42, Proposition 7.50]. However, for any Borel measure $\bm{\gamma}^{\varepsilon/2}_{k}$ there exists a Borel map $\tilde{u}:X_{k}\times R_{k}\times\ldots\times R_{N}\to U_{k}$ so that $u_{k}^{\varepsilon/2}(x_{k},r_{k},\ldots,r_{N})=\tilde{u}(x_{k},r_{k},\ldots,r_{N})$ $\bm{\gamma}^{\varepsilon/2}_{k}$ -a.e.[42, Lemma 7.27]. Thus, we can without loss of generality assume that $u_{k}^{\varepsilon/2}$ is Borel and, thus, the pushforward map in (17) and the resulting probability measures $\lambda^{\varepsilon}_{k}$ are well-defined.

5 Examples and Pitfalls

In Section 5.1, we present two examples where two marginals are enough, in line with the existing literature [1, 2]. Then, in Section 5.2, we showcase that, in general, the multi-marginal formulation is necessary. Finally, Section 5.3 shows that our results do not readily extend to noisy dynamics.

5.1 Examples when two marginals are all you need

We start with an example to which Corollary 4.3 applies:

Example 5.1 (Integrator particle dynamics, input effort).

Suppose we aim at steering a probability measure $\mu_{0}\in\mathcal{P}\left(\mathbb{R}^{n}\right)$ to a target $\rho_{N}\in\mathcal{P}\left(\mathbb{R}^{n}\right)$ in $N$ steps; i.e., $X_{k}=R_{k}=\mathbb{R}^{n}$ . The input space is $U_{k}=\mathbb{R}^{n}$ , and the dynamics are $f_{k}(x_{k},u_{k})=x_{k}+u_{k}$ . The costs are $g_{k}(x_{k},u_{k})=\left\|u_{k}\right\|^{2}$ , and $g_{N}(x_{N},r_{N})=0$ if $x_{N}=r_{N}$ and $+\infty$ otherwise, so that the stage cost in the probability space is $\mathcal{K}[g_{k}](\lambda_{k},\rho_{k})=\mathbb{E}^{\lambda_{k}}\left[\left\|v_{k}\right\|^{2}\right]$ , and the terminal cost is $\mathcal{K}[g_{N}](\mu_{N},\rho_{N})=0$ if $\mu_{N}=\rho_{N}$ and $+\infty$ otherwise. The optimal control problem in the ground space admits the solution $u_{k}(x_{k},r_{N})=\frac{r_{N}-x_{k}}{N-k}$ , with the associated cost-to-go $j_{k}(x_{k},r_{N})=\frac{N-k}{N^{2}}\left\|r_{N}-x_{k}\right\|^{2}$ . By Corollary 4.3, the cost-to-go in the space of probability measures $J_{k}$ is

[TABLE]

*and the optimal input reads $\lambda_{k}={(\operatorname{proj}^{X_{k}\times R_{N}}_{X_{k}},u_{k})}_{\#}{\bm{\gamma}_{k}}$ , where $\bm{\gamma}_{k}$ is the optimal transport plan for $J_{k}(\mu_{k},\rho_{N})$ . In the particular case where an optimal transport map $T_{k}:X_{k}\to R_{N}$ exists, the optimal input simplifies to $\lambda_{k}={(\operatorname{id}_{X_{k}},u_{k}(\cdot,T_{k}(\cdot)))}_{\#}{\mu_{k}}$ . That is, all particles having state $x_{k}$ apply the input $u_{k}(x_{k},T_{k}(x_{k}))=\frac{T_{k}(x_{k})-x_{k}}{N-k}$ . *

Sometimes, the optimal input is probabilistic:

Example 5.2 (Sometimes it is necessary to split the mass).

Let $N=1$ , and consider $X_{k}=U_{k}=R_{k}=\mathbb{R}$ , $f_{0}(x_{0},u_{0})=u_{0}$ , $g_{0}(x_{0},u_{0})=0$ , $g_{N}(x_{1},r_{1})=\left\|x_{1}-r_{1}\right\|^{2}$ . Let $\mu_{0}=\delta_{0}$ and $\rho_{1}=\frac{1}{2}\delta_{-1}+\frac{1}{2}\delta_{+1}$ . For every pair $(x_{0},r_{1})$ the solution in the ground space is $u_{0}(x_{0},r_{1})=r_{1}$ , which yields the cost-to-go $j_{0}(x_{0},r_{1})=0$ . That is, any allocation $\bm{\gamma}\in\Gamma(\delta_{0},\rho_{N})$ is optimal; in particular, the only feasible plan $\bm{\gamma^{\ast}}\in\Gamma(\delta_{0},\rho_{N})$ displaces 50% of mass to $r_{1}=-1$ and the other 50% of the mass to $r_{1}=+1$ ; see Figure 1(a). Then, the optimal input reads $\lambda_{0}={(\operatorname{proj}^{X_{0}\times R_{1}}_{X_{0}},u_{0})}_{\#}{\bm{\gamma^{\ast}}}$ : $50\%$ of the particles apply the input $u_{0}=+1$ , and the others $u_{0}=-1$ .

5.2 Why all these marginals?

Hereby, we explore the differences between Theorem 4.2 and Corollary 4.3. Specifically, we clarify why a multi-marginal optimal transport formulation arises, even when the target probability measure remains constant throughout the horizon (i.e., $\rho_{0}=\ldots=\rho_{N}\eqqcolon\rho$ ).

{counterexample}

[Two marginals are not enough]

Consider, as in Example 3.5, $X_{k}=R_{k}=\{\pm 1,0\},U_{k}=\{-1,0\}$ , dynamics $f_{k}(x_{k},u_{k})=x_{k}u_{k}$ , with horizon $N=2$ , and costs $g_{k}(x_{k},u_{k},r_{k})=\left\|x_{k}-r_{k}\right\|^{2}$ and $g_{N}(x_{N},r_{N})=\left\|x_{N}-r_{N}\right\|^{2}$ , so that the stage and terminal cost in the probability space are the squared (type 2) Wasserstein distance from the fixed reference measure $\rho=\frac{1}{2}(\delta_{-1}+\delta_{+1})$ . First, we utilize Corollary 4.3, keeping the reference constant throughout the horizon. The cost-to-go $\tilde{j_{0}}(\pm 1,\pm 1)=2$ (here and below, this notation means $\tilde{j_{0}}(+1,+1)=\tilde{j_{0}}(-1,-1)=2$ ) and $\tilde{j_{0}}(\pm 1,\mp 1)=6$ , both obtained applying at the first stage $u_{0}=0$ (and subsequently any input). The cost-to-go for the fleet is $\tilde{J_{0}}(\mu_{0},\rho)=\mathcal{K}[\tilde{j_{0}}](\mu_{0},\rho)=2$ , with the particle having state $x_{0}=\pm 1$ allocated to $r_{2}=\pm 1$ . However, from a fleet perspective, the input $u_{k}=-1$ leads to $\mu_{0}=\mu_{1}=\mu_{2}=\rho$ . By changing allocations throughout the horizon, we obtain a total cost $J_{0}(\mu_{0},\rho)=0$ . This behavior emerges naturally with Theorem 4.2. The cost-to-go in the ground space satisfies $j_{0}(x_{0}=\pm 1,r_{0}=\pm 1,r_{1}=\mp 1,r_{2}=\pm 1)=0$ , with the input $u_{k}=-1$ at all times. Then, the transport plan $\bm{\gamma}={(\operatorname{id}_{\mathbb{R}},\operatorname{id}_{\mathbb{R}},-\operatorname{id}_{\mathbb{R}},\operatorname{id}_{\mathbb{R}})}_{\#}{\mu_{0}}\in\Gamma(\mu_{0},\rho,\rho,\rho)$ yields

[TABLE]

necessarily optimal; see Figure 1(b). In particular, $J(\mu_{0},\rho,\rho,\rho)<\tilde{J}(\mu_{0},\rho)=2$ . That is, Corollary 4.3 does not apply and the optimal solution results from Theorem 4.2.

5.3 The effect of local noise

When the particle dynamics are noisy, it is common to minimize the expected particle cost via the stochastic DPA:

[TABLE]

where $\xi_{k}\in\mathcal{P}(W_{k})$ is the probability measure of the noise, and $W_{k}$ is the space of possible realizations. Since $j_{k}$ is of the form required for Corollary 4.3, it is tempting to extend our results. Unfortunately, the noisy drift may favor a different allocation of the particles, and the expectation annihilates such effect:

{counterexample}

[Corollary 4.3 does not readily extend] Consider a horizon $N=2$ and the setting depicted in Figure 1(c). Let $X_{k}=R_{k}=U_{k}=\{\heartsuit,\spadesuit,\diamondsuit,\clubsuit\}$ , and consider uniformly distributed noise over $W_{k}=\{\spadesuit,\clubsuit\}$ . The particle dynamics is $f_{k}(\heartsuit,u_{k},w_{k})=f_{k}(\diamondsuit,u_{k},w_{k})=w_{k}$ and $f_{k}(\spadesuit,u_{k},w_{k})=f_{k}(\clubsuit,u_{k},w_{k})=u_{k}$ . The stage cost is $\mathcal{K}[g_{k}]$ , where $g_{k}(\clubsuit,\heartsuit,w_{k})=g_{k}(\spadesuit,\diamondsuit,w_{k})=2$ and [math] otherwise. The terminal cost enforces the configuration $\rho_{N}=\frac{1}{2}\delta_{\diamondsuit}+\frac{1}{2}\delta_{\heartsuit}$ , namely $\mathcal{K}[g_{N}]$ with $g_{N}(x_{N},r_{N})=0$ if $x_{N}=r_{N}$ and $+\infty$ otherwise. The recursion Equation 18 yields $j_{0}(\heartsuit,\heartsuit)=1$ (any input at the first stage and $u=\heartsuit$ at the second stage), $j_{0}(\diamondsuit,\diamondsuit)=1$ , and $j_{0}(\diamondsuit,\heartsuit)=j_{0}(\heartsuit,\diamondsuit)=1$ (all with analogous transitions). Therefore, with the initial configuration and target configuration $\mu_{0}=\rho_{N}=\frac{1}{2}\left(\delta_{\heartsuit}+\delta_{\diamondsuit}\right)$ , Corollary 4.3 yields $\tilde{J_{0}}(\mu_{0},\rho_{N})=\mathcal{K}[j_{0}](\mu_{0},\rho_{N})=1$ . Instead, the DPA in the probability space gives $\mu_{1}=\frac{1}{2}\left(\delta_{\clubsuit}+\delta_{\spadesuit}\right)$ with zero cost (regardless of the input). Then, the evolution is deterministic and the cost-to-go amounts to $j_{1}(\clubsuit,\diamondsuit)=j_{1}(\spadesuit,\heartsuit)=0$ and $j_{1}(\clubsuit,\heartsuit)=j_{1}(\spadesuit,\diamondsuit)=1$ . Thus, Corollary 4.3 applies and yields $J_{1}(\mu_{1},\rho_{N})=\mathcal{K}[j_{1}](\mu_{1},\rho_{N})=\frac{1}{2}j_{1}(\clubsuit,\diamondsuit)+\frac{1}{2}j_{1}(\spadesuit,\heartsuit)=0$ . Overall, $J_{0}(\mu_{0},\rho_{N})\leq 0+J_{1}(\mu_{1},\rho_{N})=0<1=\tilde{J_{0}}(\mu_{0},\rho_{N})$ . Thus, the naive application of Corollary 4.3 is suboptimal.

6 Proof of Theorems 4.2 and 4.3

For the proof Theorems 4.2 and 4.3, we need a few preliminary lemmata. For ease of notation, let $X\coloneqq X_{1}\times\ldots\times X_{k}$ , $Y\coloneqq Y_{1}\times\ldots\times Y_{h}$ , and $Z\coloneqq Z_{1}\times\ldots\times Z_{k}$ . To start, we introduce a variation of (8), in which only the first $k$ marginals $\mu_{i}\in\mathcal{P}(X_{i})$ are fixed. Namely,

[TABLE]

where $c:X\times Y\to\bar{\mathbb{R}}_{\geq 0}$ is the transportation cost. When $c\in X\to\bar{\mathbb{R}}_{\geq 0}$ (i.e., there are no free marginals), we conveniently write $\mathcal{J}[c](\mu_{1}\ldots,\mu_{k})=\mathcal{K}[c](\mu_{1}\ldots,\mu_{k})$ . Further, given a collection of maps $\{l_{k}:X_{k}\to Y_{k}\}_{k=i}^{j}$ , we denote by $l\coloneqq l_{i}\times\ldots\times l_{j}$ the map $X_{i}\times\ldots\times X_{j}\to Y_{i}\times\ldots\times Y_{j}$ defined point-wise as $(x_{i},\ldots,x_{j})\mapsto(l_{i}(x_{i}),\ldots,l_{j}(x_{j}))$ . Given the probability measures $\{\mu_{i}\in\mathcal{P}(X_{i})\}_{i=1}^{k}$ , $\mu\coloneqq(\mu_{1},\ldots,\mu_{k})$ , we conveniently write ${l}_{\#}{\mu}\coloneqq({l_{1}}_{\#}{\mu_{1}},\ldots,{l_{k}}_{\#}{\mu_{k}})$ . A measure-valued map $X\ni x\mapsto\mu\in\mathcal{P}(X)$ is Borel if and only if, for any Borel set $B\subseteq X$ , the map $x\mapsto\mu(B)$ is Borel.

In our setting, the cost-to-go will be an optimal transport discrepancy, and the dynamics are a pushforward. To relate the cost-to-go at the ${k}^{\mathrm{th}}$ stage to the one at the previous time step, we rigorously formalize their interplay. A similar but less general result (i.e., only with two fixed marginals) was derived in the context of uncertainty propagation via optimal transport [43].

Lemma 6.1 (Pushforward and optimal transport).

Given a transportation cost $c:Z\times Y\to\bar{\mathbb{R}}_{\geq 0}$ , $k\in\mathbb{N}_{\geq 1},h\in\mathbb{N}$ , maps $\{l_{i}:X_{i}\to Z_{i}\}_{i=1}^{k}$ , $l\coloneqq l_{1}\times\ldots\times l_{k}$ , and probability measures $\{\mu_{i}\in\mathcal{P}(X_{i})\}_{i=1}^{k}$ , $\mu\coloneqq(\mu_{1},\ldots,\mu_{k})$ , it holds:

[TABLE]

Proof 6.2.

We prove “ $\leq$ ” and “ $\geq$ ” separately. We start with “ $\geq$ ”. For any $\bm{\mu}\in\mathcal{P}(X\times Y)$ such that ${(\operatorname{proj}^{X\times Y}_{X})}_{\#}{\bm{\mu}}\in\Gamma(\mu)$ , let $\bm{\mu^{\prime}}={(l\times\operatorname{id}_{Y})}_{\#}{\bm{\mu}}.$ For $i\in\{1,\ldots,k\}$ consider $\phi\in C_{b}(Z_{i})$ . It holds:

[TABLE]

That is, ${(\operatorname{proj}^{Z\times Y}_{Z_{i}})}_{\#}{\bm{\mu^{\prime}}}={l_{i}}_{\#}{\mu_{i}}$ and, thus, ${(\operatorname{proj}^{Z\times Y}_{Z})}_{\#}{\bm{\mu^{\prime}}}\in\Gamma({l}_{\#}{\mu})$ . Similarly, for all $j\in\{1,\ldots,h\}$ we have ${(\operatorname{proj}^{Z\times Y}_{Y_{j}})}_{\#}{\bm{\mu^{\prime}}}={(\operatorname{proj}^{X\times Y}_{Y_{j}})}_{\#}{\bm{\mu}}\in\mathcal{P}(Y_{j})$ . Therefore, $\bm{\mu^{\prime}}$ provides the upper bound $\mathcal{J}[c]({l}_{\#}{\mu})\leq\int_{X\times Y}c\circ(l\times\operatorname{id}_{Y})\,\mathrm{d}\bm{\mu}.$ Since $\bm{\mu}$ is arbitrary, we obtain $\mathcal{J}[c]({l}_{\#}{\mu})\leq\mathcal{J}[c\circ(l\times\operatorname{id}_{Y})](\mu)$ .

To prove “ $\leq$ ”, fix $\bm{\mu^{\prime}}\in\mathcal{P}(Z\times Y)$ with ${(\operatorname{proj}^{Z\times Y}_{Z})}_{\#}{\bm{\mu^{\prime}}}\in\Gamma({l}_{\#}{\mu})$ . By definition, $\bm{\mu^{\prime}}\in\Gamma({l}_{\#}{\mu},{(\operatorname{proj}^{Z\times Y}_{Y})}_{\#}{\bm{\mu^{\prime}}}).$ Then, for all $i\in\{1,\ldots,k\}$ , let $\bm{\mu_{i}}={\left(\operatorname{id}_{X_{i}},l_{i}\right)}_{\#}{\mu_{i}}\in\mathcal{P}(X_{i}\times Z_{i}).$ Analogously to the previous step, we have $\bm{\mu_{i}}\in\Gamma(\mu_{i},{l_{i}}_{\#}{\mu_{i}})$ . We can “glue” $\{\bm{\mu_{i}}\}_{i=1}^{k}$ and $\bm{\mu^{\prime}}$ to obtain $\bm{\mu^{\ast}}\in\mathcal{P}(X\times Z\times Y)$ such that ${(\operatorname{proj}^{X\times Z\times Y}_{X\times Z})}_{\#}{\bm{\mu^{\ast}}}\in\Gamma(\mu,{l}_{\#}{\mu}).$ Specifically, we apply $k$ times [9, Gluing lemma] as follows. First, we glue $\bm{\mu^{\prime}}$ and $\bm{\mu_{1}}$ , since they share a marginal: ${(\operatorname{proj}^{Z\times Y}_{Z_{1}})}_{\#}{\bm{\mu^{\prime}}}={l_{1}}_{\#}{\mu_{1}}={(\operatorname{proj}^{X_{1}\times Z_{1}}_{Z_{1}})}_{\#}{\bm{\mu_{1}}}.$ Call the resulting plan $\bm{\mu^{*}_{1}}\in\Gamma(\mu_{1},{l}_{\#}{\mu},{(\operatorname{proj}^{Z\times Y}_{Y})}_{\#}{\bm{\mu^{\prime}}}).$ Next, define inductively $\bm{\mu^{*}_{i}}\in\Gamma(\mu_{1},\ldots,\mu_{i},{l}_{\#}{\mu},{(\operatorname{proj}^{Z\times Y}_{Y})}_{\#}{\bm{\mu^{\prime}}})$ as the plan obtained from gluing $\bm{\mu^{*}_{i-1}}$ and $\bm{\mu_{i}}$ , for $i\in\{2,\ldots,k\}$ . The definition is well-posed in view of [9, Gluing lemma], since ${(\operatorname{proj}^{X_{1}\times\ldots\times X_{i}\times Z\times Y}_{Z_{i}})}_{\#}{\bm{\mu^{\ast}_{i-1}}}={l_{i}}_{\#}{\mu_{i}}={(\operatorname{proj}^{X_{i}\times Z_{i}}_{Z_{i}})}_{\#}{\bm{\mu_{i}}}.$ Finally, we take $\bm{\mu^{\ast}}=\bm{\mu^{\ast}_{k}}$ , so that $\bm{\mu}={(\operatorname{proj}^{X\times Y\times Z}_{X\times Y})}_{\#}{\bm{\mu^{\ast}}}\in\Gamma\left({\mu}_{1}^{k},{(\operatorname{proj}^{Z\times Y}_{Y})}_{\#}{\bm{\mu^{\prime}}}\right).$ Let $\bar{X}\coloneqq X_{1}\times\ldots\times X_{k-1}$ , $\bar{Z}\coloneqq Z_{1}\times\ldots\times Z_{k-1}$ , and $\bar{l}\coloneqq l_{1}\times\ldots\times l_{k-1}$ . Then, for the ${k}^{\mathrm{th}}$ argument of $c$ ,

[TABLE]

where in $\clubsuit$ we used the disintegration theorem ([8, Theorem 5.3.1]), which provides us a collection $\{\bm{\tilde{\mu}^{x_{k}z_{k}}}\}_{(x_{k},z_{k})\in X_{k}\times Z_{k}}$ to complement $\bm{\mu_{k}}$ . Then, in $\spadesuit$ , we used the definition of $\bm{\mu_{k}}$ : $z_{k}=l_{k}(x_{k})$ $\mu_{k}$ - a.e.. Repeating the same steps for the other arguments of $c$ we obtain $\mathcal{J}[c\circ(l\times\operatorname{id}_{Y})](\mu)\leq\int_{X\times Y}c(l(x),y)\,\mathrm{d}\bm{\mu}(x,y)=\int_{X\times Z\times Y}c(z,y)\,\mathrm{d}\bm{\mu^{\ast}}(x,z,y)=\int_{Z\times Y}c\,\mathrm{d}\bm{\mu^{\prime}}$ . Since $\bm{\mu^{\prime}}$ is arbitrary, it follows $\mathcal{J}[c\circ(l\times\operatorname{id}_{Y})](\mu)\leq\mathcal{J}[c]({l}_{\#}{\mu})$ .

The next result express the sum of two optimal transport discrepancies, possibly with free marginals, as a single optimal transport discrepancy, with the same free marginals. Similar results provide multi-marginal reformulations for Wasserstein barycenters [44, 45], whose computation has recently received much interest [46, 47].

Lemma 6.3 (Sum of optimal transport discrepancies).

Given transportation costs $c_{1}:X_{1}\times Z\to\bar{\mathbb{R}}_{\geq 0}$ , $c_{2}:X\times Y\to\bar{\mathbb{R}}_{\geq 0}$ , and probability measures $\{\mu_{i}\in\mathcal{P}(X_{i})\}_{i=1}^{k}$ , $\mu\coloneqq(\mu_{1},\ldots,\mu_{k})$ , $\nu\in\mathcal{P}(Z)$ , it holds $\mathcal{K}[c_{1}](\mu_{1},\nu)+\mathcal{J}[c_{2}](\mu)=\mathcal{J}[c](\mu,\nu),$ with $c:X\times Y\times Z\to\bar{\mathbb{R}}_{\geq 0}$ defined as $c(x_{1},\ldots,x_{k},y,z)=c_{1}(x_{1},z)+c_{2}(x_{1},\ldots,x_{k},y).$

Proof 6.4.

We prove separately “ $\leq$ ” and “ $\geq$ ”. With the short-hand notation $x\coloneqq(x_{1},\ldots,x_{k})$ , “ $\leq$ ” follows minimizing separately over the shared marginal:

[TABLE]

where in $\heartsuit$ (i) we noticed that the first infimum is only over ${(\operatorname{proj}^{X\times Y\times Z}_{X_{1}\times Z})}_{\#}{\bm{\gamma}}=\bm{\gamma^{\prime}}\in\Gamma(\mu_{1},\nu)$ , and (ii) in the second infimum we used Lemma 6.1 with the pushforward map being $\operatorname{proj}^{X\times Y\times Z}_{X\times Y}$ .

*We now prove “ $\geq$ ”. For all $\varepsilon>0$ , consider $\varepsilon$ -optimal $\bm{\gamma_{1}}^{\varepsilon}\in\Gamma(\mu_{1},\nu)$ and $\bm{\gamma_{2}}^{\varepsilon}\in\mathcal{P}(X\times Y)$ so that ${(\operatorname{proj}^{X\times Y}_{X})}_{\#}{\bm{\gamma_{2}}^{\varepsilon}}\in\Gamma(\mu)$ ; i.e., $\int_{X_{1}\times Z}c_{1}\,\mathrm{d}\bm{\gamma_{1}}^{\varepsilon}\leq\mathcal{K}[c_{1}](\mu_{1},\nu)+\varepsilon$ and $\int_{X\times Y}c_{2}\,\mathrm{d}\bm{\gamma_{2}}^{\varepsilon}\leq\mathcal{J}[c_{2}](\mu)+\varepsilon.$ Since ${(\operatorname{proj}^{X_{1}\times Z}_{X_{1}})}_{\#}{\bm{\gamma_{1}}^{\varepsilon}}=\mu_{1}={(\operatorname{proj}^{X\times Y}_{X_{1}})}_{\#}{\bm{\gamma_{2}}^{\varepsilon}}$ , we can glue them [9, Gluing lemma] to obtain $\bm{\gamma}^{\varepsilon}\in\Gamma(\mu,\nu,{(\operatorname{proj}^{X\times Y}_{Y})}_{\#}{\bm{\gamma_{2}}^{\varepsilon}}).$ Then, it holds *

[TABLE]

and, thus, $\mathcal{J}[c](\mu,\nu)\leq\mathcal{K}[c_{1}](\mu_{1},\nu)+\mathcal{J}[c_{2}](\mu)+2\varepsilon$ . Let $\varepsilon\to 0$ to conclude.

In particular, when $\mathcal{K}[c_{1}]$ is an expected value, the composition simplifies:

Lemma 6.5 (Compositionality of optimal transport).

Given a cost $v:X\to\bar{\mathbb{R}}_{\geq 0}$ , a transportation cost $c:Y\times Z\to\bar{\mathbb{R}}_{\geq 0}$ , a map $l:X\to Y$ , and probability measures $\mu\in\mathcal{P}(X),\nu\in\mathcal{P}(Z)$ , it holds $\mathbb{E}^{\mu}\left[v\right](\mu)+\mathcal{K}[c]({l}_{\#}{\mu},\nu)=\mathcal{K}[v+c\circ(l\times\operatorname{id}_{Y})](\mu,\nu).$

Proof 6.6.

The statement is a special case of Lemma 6.1.

Finally, we give a useful disintegration property of the cost term $\mathcal{J}[c]$ :

Lemma 6.7 (Disintegration of the optimizer).

Given a transportation cost $c\in\operatorname{lsc}(X\times Y\times Z,\bar{\mathbb{R}}_{\geq 0})$ and probability measures $\mu\in\mathcal{P}(X)$ , $\nu\in\mathcal{P}(Y)$ , it holds:

[TABLE]

where $\Lambda(Z)\coloneqq\{\{\xi^{xy}\}_{(x,y)\in X\times Y}\subseteq\mathcal{P}(Z)\,|\,X\times Y\ni(x,y)\mapsto\xi^{xy}\in\mathcal{P}(Z)\text{ Borel}\}.$

Proof 6.8.

We prove “ $\geq$ ” and “ $\leq$ ” separately. To prove “ $\geq$ ”, consider any $\bm{\gamma}\in\mathcal{P}(X\times Y\times Z)$ such that ${(\operatorname{proj}^{X\times Y\times Z}_{X\times Y})}_{\#}{\bm{\gamma}}\in\Gamma(\mu,\nu)$ . By [8, Theorem 5.3.1], there exists $\{\bm{\gamma}^{xy}\}\in\Lambda(Z)$ such that

[TABLE]

Then, take the infimum over $\bm{\gamma}$ .

To prove “ $\leq$ ”, we follow [8, §5.3] to construct the reverse of the disintegration. Given any $\bm{\gamma^{\prime}}\in\Gamma(\mu,\nu)$ , and any $\{\xi^{xy}\}\in\Lambda(Z)$ . Then, we can construct a Borel probability measure $\bm{\gamma}\in\mathcal{P}(X\times Y\times Z)$ defined for every Borel set $B\subseteq X\times Y\times Z$ as $\bm{\gamma}(B)=\int_{X\times Y}\int_{Z}1_{B}(x,y,z)\,\mathrm{d}\xi^{xy}(z)\,\mathrm{d}\bm{\gamma^{\prime}}(x,y).$ For $\phi\in C_{b}(X\times Y)$ , we have

[TABLE]

Thus, ${(\operatorname{proj}^{X\times Y\times Z}_{X\times Y})}_{\#}{\bm{\gamma}}=\bm{\gamma^{\prime}}\in\Gamma(\mu,\nu)$ . Therefore,

[TABLE]

and the claim follows taking the infimum over $\bm{\gamma^{\prime}}$ and $\{\xi^{xy}\}$ .

We are now ready to prove Theorems 4.2 and 4.3:

Proof 6.9 (Proof of Theorem 4.2).

We prove the statements separately. To ease the notation, we recall $R\coloneqq R_{k}\times R_{k+1}\times\ldots R_{N}$ , and we introduce

[TABLE]

(i)

We proceed by induction. The base case is $J_{N}=\mathcal{K}[g_{N}]$ and $j_{N}=g_{N}$ . For $k<N$ , suppose $J_{k+1}=\mathcal{K}[j_{k+1}]$ . Then, the backward recursion gives

[TABLE]

where first, in $\spadesuit$ , we used the definition of $c_{k}$ (see (19)), Lemma 6.1, and Lemma 6.3. Second, in $\diamondsuit$ , we used Lemma 6.7. Third, $\clubsuit$ requires proving separately “ $\geq$ ” and “ $\leq$ ”. Let $\psi(x_{k},\xi,r)\coloneqq\int_{U_{k}}c_{k}(x_{k},u,r)\,\mathrm{d}\xi(u)$ . Then, $\Lambda(U_{k})\subseteq\{\xi^{x_{k}r}\in\mathcal{P}(U_{k})\}$ , and $\psi(x_{k},\xi,r)\geq\inf_{\xi\in\mathcal{P}(U_{k})}\psi(x_{k},\xi,r)$ reveal “ $\geq$ ”.

To prove “ $\leq$ ”, let $\Omega\coloneqq\operatorname{supp}(\mu_{k})\times\operatorname{supp}(\rho)\subseteq X_{k}\times R$ and $\bm{\gamma^{\prime}}\in\Gamma(\mu_{k},\rho)$ . By definition, we can restrict the integration domain to the support of $\bm{\gamma^{\prime}}$ , for which it holds $\operatorname{supp}(\bm{\gamma^{\prime}})\subseteq\operatorname{supp}(\mu_{k})\times\operatorname{supp}(\rho)$ . We thus consider $\Omega$ in place of $X_{k}\times R$ as the integration domain. For all $\varepsilon>0$ , consider the collection $\{u_{k}^{\varepsilon/2}(x_{k},r)\}_{(x_{k},r)\in\Omega}\subseteq U_{k}$ . Without loss of generality, we assume that $u_{k}^{\varepsilon/2}$ is Borel; see the discussion in Section 4. As a consequence, also the measure-valued map $h:\Omega\to\mathcal{P}(U_{k}),h(x_{k},r)\coloneqq\delta_{u_{k}^{\varepsilon/2}(x_{k},r)}$ is Borel. To show this, we can equivalently show that, for every $B\subseteq U_{k}$ Borel, the pre-image of the intervals $[a,+\infty]$ , for $a\in\mathbb{R}$ , of $(x_{k},r)\mapsto h(x_{k},r)(B)$ is Borel. Let $h_{B}:U_{k}\to\mathbb{R}_{\geq 0},u\mapsto h_{B}(u)\coloneqq\delta_{u}(B)$ . Then, $h_{B}^{-1}([a,+\infty))=\emptyset$ if $a>1$ , $h_{B}^{-1}([a,+\infty))=B$ if $a\in(0,1]$ , and $h_{B}^{-1}([a,+\infty))=U_{k}$ otherwise. In all cases, $h_{B}^{-1}([a,+\infty))$ is Borel set, and, thus, the map $h_{B}$ is Borel. Since the composition of Borel maps is a Borel map, $h_{B}\circ u_{k}^{\varepsilon/2}$ is Borel. Therefore, the measure-valued map $h$ is Borel. Then, $\{\delta_{u_{k}^{\varepsilon/2}(x_{k},r)}\}_{(x,r)\in\Omega}\in\Lambda(U_{k})$ , with $\Lambda(U_{k})$ as in Lemma 6.7. Thus,

[TABLE]

Take the infimum over $\bm{\gamma^{\prime}}$ on both sides and let $\varepsilon\to 0$ to prove “ $\leq$ ”.

Next, it holds $\inf_{\xi\in\mathcal{P}(U_{k})}\int_{U_{k}}c_{k}(x_{k},u,r)\,\mathrm{d}\xi(u)\geq\inf_{u\in U_{k}}c_{k}(x_{k},u,r)=j_{k}(x,r).$ For “ $\leq$ ”, let $\{u_{n}\}_{n\in\mathbb{N}}\subseteq U_{k}$ yield $j_{k}(x_{k},r)=\lim_{n\to\infty}c_{k}(x_{k},u_{n},r)$ , and consider $\{\delta_{u_{n}}\}_{n\in\mathbb{N}}\subseteq\mathcal{P}(U_{k})$ . For all $n\in\mathbb{N}$ we have

[TABLE]

The limit $n\to\infty$ reveals “ $\leq$ ” and, thus, the equality. Thus, for every $x_{k}\in X_{k},r\in R$ , we have $\inf_{\xi\in\mathcal{P}(U_{k})}\int_{U_{k}}c_{k}(x_{k},u,r)\,\mathrm{d}\xi(u)=j_{k}(x_{k},r)$ , and so $J_{k}(\mu_{k},\rho)=\inf_{\bm{\gamma^{\prime}}\in\Gamma(\mu_{k},\rho)}\int_{X_{k}\times R}j_{k}\,\mathrm{d}\bm{\gamma^{\prime}}$ .

This proves (13). Finally, analogously to the traditional DPA **[7, 32]**, the additivity of the cost structure yields $J=J_{0}$ . 2. (ii)

Let $\varepsilon\geq 0$ and define $u_{k}^{\varepsilon/2}$ , and $\bm{\gamma}^{\varepsilon/2}_{k}$ as in the theorem statement. Consider the (possibly sub-optimal) plan

[TABLE]

By definition, ${\operatorname{proj}^{X_{k}\times U_{k}\times R}_{X_{k}\times U_{k}}}_{\#}{\bm{\tilde{\gamma}}^{\varepsilon}_{k}}=\lambda^{\varepsilon}_{k}$ and ${\operatorname{proj}^{X_{k}\times U_{k}\times R}_{R}}_{\#}{\bm{\tilde{\gamma}}^{\varepsilon}_{k}}=\rho$ . Therefore, $\bm{\tilde{\gamma}}^{\varepsilon}_{k}$ is a valid choice for the infimum, and it holds:

[TABLE]

where, in $\heartsuit$ , we used the definition of $c_{k}$ (see (19)), Lemma 6.1, and Lemma 6.3. Overall, $\lambda^{\varepsilon}_{k}$ is an $\varepsilon$ -optimal control input at $\mu_{k}$ . When $\varepsilon=0$ , the infima are attained and we obtain the optimal state-input distribution $\lambda^{\ast}_{k}$ . 3. (iii)

The statement follows from (ii), plugging in the given maps $u_{k}^{\varepsilon/2}$ and $T^{\varepsilon/2}_{k}$ .

Proof 6.10 (Proof of Corollary 4.3).

The proof is analogous to Theorem 4.2. To express the cost-to-go $J_{k}$ as a two-marginals optimal transport discrepancy, it suffices to replace Lemma 6.3 with Lemma 6.5. The simplified optimal control input $\lambda^{\varepsilon}_{k}$ follows.

7 Conclusions

We showed that many discrete-time finite-horizon optimal control problems in probability spaces are multi-marginal optimal transport problems, whose transportation cost stems from an optimal control problem in the space on which the probability measures are defined. This implies a separation principle: The optimal control strategy for a fleet of identical agents results from the optimal control strategy of each agent (how to go from $x$ to $y$ ?) and an optimal transport problem (who goes from $x$ to $y$ ?). We complemented our theoretical results with various examples. Among others, our results back up many existing approaches in the literature which a-priori formalize the distribution/fleet steering problems as an optimal transport problem and not as an optimal control problem in the probability space. Our analysis bases on novel stability results for the multi-marginal optimal transport problem, whose study is of independent interest.

Future work will explore extensions to noisy dynamics and different cost functionals, the limit cases of the infinite horizon and continuous-time dynamics, and the practical impact of our theoretical results.

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Mathias Hudoba de Badyn, Erik Miehling, Dylan Janak, Behçet Açkmeşe, Mehran Mesbahi, Tamer Başar, John Lygeros, and Roy S Smith. Discrete-time linear-quadratic regulation via optimal transport. In 60th Conference on Decision and Control , pages 3060–3065, 2021.
2[2] Vishaal Krishnan and Sonia Martínez. Distributed online optimization for multi-agent optimal transport. ar Xiv preprint ar Xiv:1804.01572 , 2019.
3[3] Antonio Terpin, Sylvain Fricker, Michel Perez, Mathias Hudoba de Badyn, and Florian Dörfler. Distributed feedback optimisation for robotic coordination. In 2022 American Control Conference , pages 3710–3715, 2022.
4[4] Gioele Zardini, Nicolas Lanzetti, Marco Pavone, and Emilio Frazzoli. Analysis and control of autonomous mobility-on-demand systems. Annual Review of Control, Robotics, and Autonomous Systems , 5(1), 2021.
5[5] Giacomo Albi, Lorenzo Pareschi, and Mattia Zanella. On the optimal control of opinion dynamics on evolving networks. In IFIP Conference on System Modeling and Optimization , pages 58–67. Springer, 2015.
6[6] Elizabeth Y Huang, Dario Paccagnan, Wenjun Mei, and Francesco Bullo. Assign and appraise: Achieving optimal performance in collaborative teams. IEEE Transactions on Automatic Control , 2022.
7[7] Dimitri Bertsekas. Abstract dynamic programming . Athena Scientific, 2022.
8[8] L. Ambrosio, N. Gigli, and G. Savaré. Gradient Flows: In Metric Spaces and in the Space of Probability Measures . Birkhäuser Basel, 1 edition, 2008.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Dynamic Programming in Probability Spaces via Optimal Transport

Abstract

1 Introduction

Example 1.1** (Deterministic optimal control).**

Example 1.2** (Distribution steering).**

Example 1.3** (Large-scale multi-agent systems).**

1.1 Contributions

1.2 Previous work

1.3 Organization

1.4 Notation

2 The Space of Probability Measures

2.1 Preliminaries

2.2 Optimal transport

Remark 2.1**.**

3 Problem Statement

Example 3.1** (Robots in a grid).**

Problem 3.2** (Discrete-time optimal control in probability spaces).**

3.1 State-input distribution

Example 3.3** (Robots in a grid, continued).**

Remark 3.4**.**

3.2 Dynamics

Example 3.5** (Robots in a grid, continued).**

3.3 Cost

Example 3.6** (Robots in a grid, continued).**

3.4 DPA for 3.2

Definition 3.7** (DPA).**

3.5 Auxiliary problem: DPA in the ground space

4 Main Result

4.1 A separation principle in the probability space

Theorem 4.1** (DPA in probability spaces via optimal transport, informal).**

4.2 A rigorous statement

Theorem 4.2** (DPA in probability spaces via optimal transport).**

Corollary 4.3** (When two marginals are all you need).**

Discussion

How to construct optimal state-input distributions?

Existence of optimal solutions

Existence of optimal input maps

Existence of optimal transport maps

Connections to previous work

Computational attractiveness

Design of transportation costs

Measurability issues

5 Examples and Pitfalls

5.1 Examples when two marginals are all you need

Example 5.1** (Integrator particle dynamics, input effort).**

Example 5.2** (Sometimes it is necessary to split the mass).**

5.2 Why all these marginals?

5.3 The effect of local noise

6 Proof of Theorems 4.2 and 4.3

Lemma 6.1** (Pushforward and optimal transport).**

Proof 6.2**.**

Lemma 6.3** (Sum of optimal transport discrepancies).**

Proof 6.4**.**

Lemma 6.5** (Compositionality of optimal transport).**

Proof 6.6**.**

Lemma 6.7** (Disintegration of the optimizer).**

Proof 6.8**.**

Proof 6.9** (Proof of Theorem 4.2).**

Proof 6.10** (Proof of Corollary 4.3).**

7 Conclusions

Example 1.1 (Deterministic optimal control).

Example 1.2 (Distribution steering).

Example 1.3 (Large-scale multi-agent systems).

Remark 2.1.

Example 3.1 (Robots in a grid).

Problem 3.2 (Discrete-time optimal control in probability spaces).

Example 3.3 (Robots in a grid, continued).

Remark 3.4.

Example 3.5 (Robots in a grid, continued).

Example 3.6 (Robots in a grid, continued).

Definition 3.7 (DPA).

Theorem 4.1 (DPA in probability spaces via optimal transport, informal).

Theorem 4.2 (DPA in probability spaces via optimal transport).

Corollary 4.3 (When two marginals are all you need).

Example 5.1 (Integrator particle dynamics, input effort).

Example 5.2 (Sometimes it is necessary to split the mass).

Lemma 6.1 (Pushforward and optimal transport).

Proof 6.2.

Lemma 6.3 (Sum of optimal transport discrepancies).

Proof 6.4.

Lemma 6.5 (Compositionality of optimal transport).

Proof 6.6.

Lemma 6.7 (Disintegration of the optimizer).

Proof 6.8.

Proof 6.9 (Proof of Theorem 4.2).

Proof 6.10 (Proof of Corollary 4.3).