Dynamic Programming in Probability Spaces via Optimal Transport
Antonio Terpin, Nicolas Lanzetti, Florian D\"orfler

TL;DR
This paper develops a framework for solving discrete-time optimal control problems in probability spaces by combining dynamic programming on the ground space with optimal transport, enabling decoupled multi-agent control strategies.
Contribution
It introduces a novel approach that links dynamic programming in probability spaces with optimal transport, providing a separation principle for multi-agent control.
Findings
Solution of dynamic programming in probability spaces via ground space and optimal transport
Decoupling of low-level agent control and fleet-level control
Applicable to multi-agent systems with probabilistic states
Abstract
We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces results from two ingredients: (i) the solution of dynamic programming in the "ground space" (i.e., the space on which the probability measures live) and (ii) the solution of an optimal transport problem. From a multi-agent control perspective, a separation principle holds: The "low-level control of the agents of the fleet" (how does one reach the destination?) and "fleet-level control" (who goes where?) are decoupled.
| ground space | measure space | |
|---|---|---|
| state | ||
| reference | ||
| state-input distribution | ||
| s.t. | ||
| dynamics | ||
| cost-to-go | ||
| stage and terminal costs | and | and |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Dynamic Programming in Probability Spaces via Optimal Transport
Antonio Terpin
Nicolas Lanzetti
Florian Dörfler
Abstract
We study discrete-time finite-horizon optimal control problems in probability spaces, whereby the state of the system is a probability measure. We show that, in many instances, the solution of dynamic programming in probability spaces results from two ingredients: (i) the solution of dynamic programming in the “ground space” (i.e., the space on which the probability measures live) and (ii) the solution of an optimal transport problem. From a multi-agent control perspective, a separation principle holds: The “low-level control of the agents of the fleet” (how to reach the destination?) and “fleet-level control” (who goes where?) are decoupled.
\footnotefirstpage
∗ Equal contribution. \footnotefirstpageThis research was supported by the Swiss National Science Foundation under the NCCR Automation, grant agreement 51NF40_180545.
1 Introduction
Many optimal control problems of stochastic or large-scale dynamical systems can be framed in the probability space, whereby the state is a probability measure. We provide three examples, starting with a pedagogical case:
Example 1.1** (Deterministic optimal control).**
Consider a discrete-time dynamical system with state space , input space , and dynamics . The problem of steering the system from an initial state towards a target state in time-steps while minimizing the terminal cost and the stage costs reads
[TABLE]
subject to the dynamics. The costs and measure the “closeness” between the state and the reference , as well as the input effort. For instance, when all the spaces are , they may be defined as and . It is instructive to capture this setting via probability measures. At each time-step , consider the Dirac’s delta probability measure , and let . The relation between and is the “pushforward” operation , a dynamics in the probability space. Then, an equivalent formulation to (1) is
[TABLE]
where and have the same meaning as the lower-case counterparts in (1) but are defined in the probability space.
Example 1.2** (Distribution steering).**
Assume now that the initial condition in Example 1.1 is unknown, but its realization is distributed according to , with being the space of probability measures over . The input to apply to each “particle” having state is given by the (deterministic) feedback map , and the dynamical evolution is . Similarly to Example 1.1, the dynamics in the probability space are then . The same formalism of Example 1.1 can then be used to ensure that the terminal state is distributed closely to a desired probability measure :
[TABLE]
where measures closeness between and , akin to and in (2).
Example 1.3** (Large-scale multi-agent systems).**
The optimal steering of a fleet of identical agents, with dynamics , from an initial configuration to a desired one can be cast as
[TABLE]
where is a fleet-specific cost (e.g., a cohesion or formation cost), and and are agent-specific costs (e.g., input effort). Often, the interest lies in the macroscopic behavior of the fleet. Hence, it is custom to capture the state of the fleet by a probability measure and the input by a map [1, 2]. The optimization problem in (4) can then be written as an optimal control problem, with state , input , and dynamics . Overall,
[TABLE]
Such modeling approach suits robotics [3], mobility [4], and social networks [5, 6].
Formally, (2), (3), and (5) are instances of discrete-time finite-horizon optimal control problems in probability spaces:
[TABLE]
subject to the measure dynamics , where are (possibly time-dependent) reference probability measures. In this paper, we consider as optimal transport discrepancies: An optimal transport discrepancy measures the effort to transport one probability measure onto another when moving a unit of mass from to costs ; see Section 2. To solve (6), one possibility is the Dynamic Programming Algorithm (DPA) [7]. However, its deployment poses several analytical and computational challenges. For example, it is unclear which easy-to-verify assumptions ensure the existence of solutions. Moreover, even if a minimizer exists, its computation suffers the infinite dimensionality of the probability space, and the burden of repeated computations of optimal transport discrepancies; see Section 3.
This inevitable complexity prompts us to adopt a different perspective: At least formally, (6) resembles a single optimal transport problem [8, 9], whereby one seeks to transport one probability measure to a final one while minimizing some transportation cost. If this formal similarity is made rigorous, we can tackle (6) with the tools of optimal transport theory, which has reached significant maturity in the last years, both theoretically [8, 10, 9] and numerically [11, 12, 13].
1.1 Contributions
We study the optimal control and dynamic programming in probability spaces through the lens of optimal transport theory. Specifically, we show that many optimal control problems in probability spaces can be reformulated and studied as optimal transport problems. Our results reveal a separation principle: The “low-level control of the agents of the fleet” (how to reach the destination?) and “fleet-level control” (who goes where?) are decoupled. We complement our theoretical analysis with various examples and counterexamples, which demonstrate that our conditions cannot be relaxed and expose the pitfalls of heuristic approaches. The proofs of our results rely on novel stability results for the (multi-marginal) optimal transport problem, which are of independent interest.
1.2 Previous work
Most of the literature focuses on continuous time, and it is founded on [14], which relates the optimal transport problem and fluid mechanics. Through the optimal control lens, this formulation corresponds to an optimal control problem with integrator dynamics: The resulting flow is a time-dependent feedback law [15]. An attempt to introduce generic dynamical constraints can be found in [16], where the set of possible flows is constrained in a set of admissible ones, induced by the dynamics. Constructive results can be found in the specific setting of linear systems and Gaussian probability measures. In this case and when the control laws are affine, the space of probability measures is implicitly constrained to the space of Gaussian distributions, and closed-form solutions exist [17, 18, 19, 20]. All of these works build on traditional optimal control tools. In [21, 22, 23], instead, the authors develop a Pontryagin Maximum Principle for optimal control problems in the Wasserstein space (i.e., probability space endowed with the Wasserstein distance). Their analysis combines classical tools in optimal control theory with the “differential structure” of the Wasserstein space [8, 24]. In [25], the authors study optimal transport when the transportation cost results from the cost-to-go of a Linear Quadratic Regulator (LQR). This methodology implicitly assumes that, to steer a fleet of identical particles, one can compute the cost-to-go for the single particle and then “lift” the solution to the probability space via an optimal transport problem. While attractive, this approach generally yields sub-optimal solutions; see Section 5.
The discrete-time setting has, instead, received less attention. Towards this direction, [26, 27, 28, 29, 30] explore the covariance control problem for discrete-time linear systems, possibly subject to constraints. In [2], the authors study the optimal steering of multiple agents from an initial configuration to a final one in a distributed fashion. In [1], the authors follow an approach similar to [25], albeit in discrete time. Finally, in [31], the authors study the problem of mass transportation over a graph, embedding constraints such as the maximum flow on the edges. To do so, they exploit the structure of the ground space; in this case, the transportation graph. In all these approaches, the distribution/fleet steering problem is a-priori formalized as an optimal transport problem and not as an optimal control problem in the probability space. As we shall see in Section 4, our results bridge these two perspectives and allow us to back up and recover many of the approaches in the literature.
1.3 Organization
The paper unfolds as follows. In Sections 2 and 3, we review the space of probability measures and introduce our problem setting. We present and discuss our main result, Theorem 4.2, in Section 4. In Section 5, we provide examples to ignite an intuition on our results and expose potential pitfalls. All proofs are in Section 6. Finally, Section 7 summarizes our findings and the future directions.
1.4 Notation
We denote by the space of continuous and bounded functions , by the space of lower semi-continuous functions , and by the set of non-negative extended real numbers. The identity map on is denoted by , and the projection maps from onto are denoted by . Given the set of maps we denote by the map
2 The Space of Probability Measures
We start with notation and preliminaries in Section 2.1. Then, in Section 2.2, we review optimal transport.
2.1 Preliminaries
We assume all spaces to be Polish spaces, and all probability measures and maps to be Borel. We denote by the space of Borel probability measures on , and we denote by the Dirac’s delta at ; i.e., the probability measure defined for all Borel sets as if and otherwise. We denote by the support of a probability measure . The pushforward of a probability measure through , denoted by , is defined by for all Borel sets . For any -integrable it holds Given , is a transport map from to if ; to this extent, it suffices that for all We say that is a transport map from to if and .
2.2 Optimal transport
Given a non-negative transportation cost , the optimal transport discrepancy between two probability measures and is
[TABLE]
where is the set of couplings. A prominent example of optimal transport discrepancy is, for some , the power of the -Wasserstein distance, obtained when and the transportation cost is a metric that induces the topology on [8, §7].
Remark 2.1**.**
When the transportation cost does not depend on one of the two variables (e.g., there exists such that ), the optimal transport discrepancy reduces to an expected value; i.e., .
We will repeatedly work with a generalization of the optimal transport problem to marginals. Let , and . The multi-marginal optimal transport problem between probability measures reads
[TABLE]
where In general, the infima in (7) and (8) are not attained, unless mild conditions on the transportation cost hold true. For instance, in (7); see [9, §4]. A transport plan is -optimal when
[TABLE]
The formulation in (7) and (8) is the Kantorovich formulation of the optimal transport problem, whereby one optimizes over transport plans . The (stricter) Monge formulation111Historically, the Monge formulation comes first. For a thorough review of the history of optimal transport and its founding fathers, see [9, §1]. considers only transport plans induced by a transport map .
3 Problem Statement
Let , , and be Polish spaces, representing the state space, the input space, and the space of references in the ground space, respectively (often, ). We consider dynamical systems whose state is a probability measure over . This approach encompasses continuous approximations of multi-agent systems and systems with uncertain initial conditions (usually captured by absolutely continuous probability measures), as well as finite settings (captured by empirical probability measures). For instance,
Example 3.1** (Robots in a grid).**
Consider robots in a grid of three cells; i.e., . Suppose that the robot is located at (i.e., has state ). Then, the state of the system is . The same modeling approach applies to robots in the two-dimensional plane, simply setting .
In this setting, we focus on the following optimal control problem:
Problem 3.2** (Discrete-time optimal control in probability spaces).**
Let . For dynamics , costs and , initial condition , and reference trajectory , find the joint state-input distribution which solve
[TABLE]
Before presenting our results, we detail our setting. The notation in the ground space is juxtaposed with the one in the measure space in Table 1.
3.1 State-input distribution
The state-input distribution is a probability measure on whose first marginal is . The semantics is as follows: The probability mass assigned by to the pair indicates the probability that one particle has state and applies the input or, equivalently, the share of agents which have state and apply the input . When for some , the input is “deterministic”: All particles that have state apply the input .
Example 3.3** (Robots in a grid, continued).**
Consider again identical robots on , where at each time step each robot can either move to the origin () and stay there forever, or change position (), so that and . Consider the following input-state distributions and .
[TABLE]
[TABLE]
In the first case (i.e., ), of the robots are located at and go to the origin (), of the robots are located at and switch position (), and of the robots are located at and remain there (, despite irrelevant for the dynamics). The input is not deterministic, since not all robots located at apply the same input. From we can also infer the distribution of the robots: of them are located at , and the other at . In the second case (i.e., ), the input is deterministic: All robots located at switch position, and all the robots located at stay there.
Remark 3.4**.**
Two comments on our modeling choice. First, since the first marginal of is , the costs are implicitly a function of the state, the input, and the reference trajectory. Second, in many instances (including multi-agent settings with finitely many agents), optimal inputs turn out to be deterministic (in the sense outlined above). Yet, the more general joint state-input distribution considerably simplify the analysis, the same way the Kantorovich formulation is more tractable than the Monge formulation in optimal transport theory.
3.2 Dynamics
We consider measure dynamics resulting from the pushforward via a function (typically, the dynamics of the single particles); i.e., . In the special case of deterministic inputs (i.e., for some function ), the dynamics simplifies to .
Example 3.5** (Robots in a grid, continued).**
Consider the setting of Example 3.3, where . The measure dynamics are , and the two inputs of Example 3.3 yield and to .
3.3 Cost
We consider optimal transport discrepancies with, as transportation costs, (stage cost) and (terminal cost). By Remark 2.1, this modeling assumption includes expected values but not functionals such as the variance of the probability measure or the Kullback-Leibner divergence from the references , . Our formulation encompasses the terminal constraint : It suffices to set if . Similarly, state-dependent input constraints can be encoded setting when . In view of Example 1.1, the transportation costs and may be interpreted as the cost incurred by a single agent.
Example 3.6** (Robots in a grid, continued).**
Suppose that the goal is to steer robots to and to , while minimizing the input. Then, and, for some weight , possible costs are and . This way, the aim is to minimize the (type 1) Wasserstein distance from the reference at the end of the horizon (i.e., ) and the (weighted) input effort throughout the horizon (i.e., ). The weight arbitrates between these two objectives. The references for do not enter in the cost and are therefore irrelevant.
3.4 DPA for 3.2
3.2 is an instance of the classic discrete-time finite-horizon optimal control problem in abstract spaces [7, 32]. It is therefore natural to tackle it via the DPA:
Definition 3.7** (DPA).**
*Initialization: Let .
Recursion: For all , compute the cost-to-go :*
[TABLE]
Unfortunately, the DPA in probability spaces poses several analytic and computational challenges; we mention two. First, it is unclear under which easy-to-verify assumptions minimizers exist. Second, even if they do, their computation remains challenging, if not prohibitive. Already when all sets are finite, and the (generally infinite-dimensional) probability space reduces to the finite-dimensional probability simplex, (10) is excruciating. For instance, when is an optimal transport discrepancy, the mere evaluation of the cost-to-go involves solving an optimal transport problem, with all the related computational difficulties [11, 33, 12, 34]. Thus, the optimization of , needed to compute , will inevitably be very demanding.
In the following, we show that the solution of 3.2 can be constructed from the solution of the DPA in the ground space (i.e., ) and a single (possibly multi-marginal) optimal transport problem. In other words, a separation principle holds: The optimal control law results from the combination of optimal low-level control laws (found via DPA in the ground space) and a fleet-level control law (found via an optimal transport problem). This way, we bypass the cumbersome application of DPA in probability spaces as well as the repeated evaluation of optimal transport discrepancies. At least formally, our result generalizes two well-known extreme cases. On the one hand, when considering Dirac’s delta probability measures, the DPA in the probability space trivially reduces to the DPA in the ground space (see Example 1.1); on the other hand, when considering trivial dynamics (i.e., and ) and an optimal transport discrepancy as a terminal cost, 3.2 reduces to an optimal transport problem. Thus, DPA in probability spaces should be at least “as difficult as” solving both the DPA in the ground space and an optimal transport problem. As we shall see below, it is not “more difficult” than that.
3.5 Auxiliary problem: DPA in the ground space
Before presenting our main results, we introduce an auxiliary optimal control problem in the ground space:
[TABLE]
Similarly to (10), the DPA provides the cost-to-go :
[TABLE]
Specifically, we use lower-case for the cost-to-go in the ground space and capital-case for the probability space twin. By (11), (-)optimal inputs will be feedback law . In particular, an input (or, with slight abuse of notation, a feedback law ) is -optimal in (11) if
[TABLE]
4 Main Result
In this section, we present our main result. We first provide an informal statement in Section 4.1. The rigorous version is in Section 4.2.
4.1 A separation principle in the probability space
Our main result predicates a separation principle:
Theorem 4.1** (DPA in probability spaces via optimal transport, informal).**
Consider the setting of 3.2. At every stage :
- (i)
The cost-to-go is a multi-marginal optimal transport problem between the current state and the future references , with transportation cost being the cost-to-go in the ground space . 2. (ii)
The optimal state-input distribution results from the following strategy:
- (1)
Find the optimal input in the ground space; 2. (2)
Find the optimal transport plan for the cost-to-go ; 3. (3)
Dispatch the particles as prescribed by and apply to steer them to their allocated trajectory.
In words, to solve DPA in probability spaces, we first solve for the cost-to-go in the ground space and then construct a multi-marginal optimal transport problem with transportation cost . Moreover, the optimal input for a fleet of identical agents results from the composition of the optimal control strategy for each individual agent (how to optimally follow the trajectory for an agent with state ?) and the solution of a multi-marginal optimal transport problem (who has state and follows the trajectory ?). Importantly, our result reveals a separation principle: It is optimal to first devise low-level controllers for individual agents (i.e., ) and then solve an assignment problem to allocate agents to their destinations (i.e., ).
4.2 A rigorous statement
Next, we rigorously formalize Theorem 4.1:
Theorem 4.2** (DPA in probability spaces via optimal transport).**
Consider the setting of 3.2. At every stage :
- (i)
The cost-to-go equals the multi-marginal optimal transport discrepancy
[TABLE]
where is the cost-to-go in the ground space, as in (11). Moreover, the DPA yields the optimal solution . 2. (ii)
For , suppose and are -optimal in (11) and (13), respectively. Then,
[TABLE]
is an -optimal state-input distribution. If , then is optimal. 3. (iii)
If in (ii) is induced by a transport map , the -optimal control input reads .
Before discussing Theorem 4.2 and its implications, we consider the special case when the stage costs do not depend on the reference; i.e., . For instance, any shortest path problem on a graph can be converted into a finite-horizon optimal control problem (e.g., see [32]), where the weights of the edges determine the stage costs ; these depend only on the pair . In these cases, the DPA reads
[TABLE]
Accordingly, the ground space -optimal input are of the form and the cost-to-go simplifies to a two-marginals optimal transport discrepancy:
Corollary 4.3** (When two marginals are all you need).**
Consider the setting of Theorem 4.2, with . At every stage :
- (i)
The cost-to-go equals the optimal transport discrepancy
[TABLE]
where is the cost-to-go in the ground space, as in (15). Moreover, the DPA yields the optimal solution . 2. (ii)
For , suppose and are -optimal in (15) and (13), respectively. Then,
[TABLE]
is an -optimal state-input distribution. If , then is optimal. 3. (iii)
If in (ii) is induced by a transport map , the -optimal control input reads .
We defer the proofs of these results to Section 6.
Discussion
A few comments on our results are in order.
How to construct optimal state-input distributions?
We start with more details on (14) and (17). For simplicity, assume that an optimal input map and an optimal transport plan exist (else, resort to an argument). Then, (ii) in Theorems 4.2 and 4.3 predicate that an optimal state-input distribution for 3.2 results from the DPA in the ground space (i.e., ) and the solution of an optimal transport problem (i.e., ):
- (i)
Optimal particles allocation: The transport plan describes the optimal allocation of the particles throughout the horizon. In discrete instances, quantifies the share of agents with state that will follow the reference trajectory . 2. (ii)
Optimal input coupling: Accordingly, we can interpret as the number of particles at that apply the optimal input . Intuitively, assigns probability mass to if there is a trajectory to which has been allocated by , such that is the optimal input to minimize the cost along that trajectory.
Existence of optimal solutions
In turn, our results provide sufficient conditions for the existence of an optimal solution for 3.2: existence of a solution for both the DPA in the ground space and the associated optimal transport problem.
Existence of optimal input maps
An optimal solution to Equation 11 always exists when all spaces are finite, or when for any compact and , the sets are compact, the maps are lower semicontinuous and are continuous for all ; see [7, Proposition 4.2.2] and [35, Theorem 18.19]. In general, however, optimal input may not exist. For this reason, we state our results using -optimality.
Existence of optimal transport maps
If the solution of the optimal transport problem is a transport map, then (iii) in Theorem 4.2 suggests that the optimal input is deterministic. Without aims of completeness, this is the case when:
- (i)
The marginals are empirical with the same number of particles (in virtue of Birkhoff theorem [8, Theorem 6.0.1]); or 2. (ii)
The cost-to-go is continuous and semi-concave, and for each the map is injective in its domain of definition intersected with splitting sets [36, Definition 2.4], and is absolutely continuous [36, 37] (see [38, Theorem 1.2] for the case with two marginals).
Connections to previous work
The approach in the literature for distribution/fleet steering is fundamentally different from ours: It is a-priori stipulated that the steering problem is an optimal transport problem from an initial distribution to a target one, without formulating an optimal control problem in probability spaces. This way, the complexity of DPA probability spaces is bypassed, at the price, however, of potentially suboptimal solutions: There is no reason for this approach to be optimal for a corresponding control problem in the probability space. With Theorems 4.2 and 4.3, we show that, provided the transportation cost is judiciously chosen, this approach is optimal, and yields the same solution as DPA in probability spaces. For instance, the results in [1, §A] correspond to the optimal strategy when , and terminal constraint on the final distribution (see Section 3.3). The results in [1] can thus be extended to more general terminal costs (e.g., ). Instead, the results in [1, §B] are suboptimal in the sense of DPA in the probability space. By Theorem 4.2, when the stage costs are reference-dependent (e.g, ), the cost-to-go results from a multi-marginal optimal transport problem. As such, the strategy proposed in [1] does not minimize, at every time-step , the weighted sum of the squared Wasserstein distance from the target configuration and the input effort. Similarly, the problem formulation in [2] can be recovered with integrator dynamics , cost , and terminal constraint on the final distribution (see Section 3.3). With a state augmentation (the input used along the trajectory, an independent integrator dynamics) and input constraints as suggested in Section 3.3, [30, Problem 2] is a special case of our setting, with linear dynamics (see Section 3.2), stage cost , and terminal cost the squared Wasserstein distance; i.e., . Simple calculations reveal that the hard-constrained covariance formulation in [30, Problem 1] can be reformulated via a hard terminal constraint on the final probability measure (a Gaussian probability measure with appropriate covariance). In both cases, such specializations are possible because the authors restrict themselves to the Gaussian and linear setting. In general, covariance constraints or penalties require further study; see Section 3.3. Similarly, noisy settings do not immediately benefit from our reformulation; see Section 5.3. Analogous considerations hold for [26, 27, 28, 29].
Computational attractiveness
A rough time complexity analysis in the setting of Corollary 4.3 highlights why our result is computationally attractive. Consider the finite setting: Let be the number of states in the ground space, the horizon length, the number of available actions at each state, and restrict the attention to empirical probability measures consisting of particles, which can be written as , for . The number of possible states in the probability space amounts to . Therefore, the time complexity of naively applying DPA in the probability space (i.e., to compute the input at the current state) is . On the other hand, the DPA in the ground space (for all initial and terminal states) costs . An optimal transport problem boils down to solving a linear program with decision variables, which has complexity (see [39] for a sharper analysis). The total complexity of the recipe provided in Corollary 4.3 is thus , which improves over the DPA in the probability space.
Design of transportation costs
In many disciplines, the design of transportation costs is challenging; e.g., [40, 41]. For instance, in [40], the underlying Riemannian metric characterizing the trajectory of single-cell RNA is retrieved in a data-driven fashion. Theorems 4.2 and 4.3 suggest an alternative approach: First, “learn” the cost-to-go for single particles; then, use it as the transportation cost.
Measurability issues
In general, the cost-to-go is not Borel. Nonetheless, with our assumptions, it is lower semi-analytic [42, Corollary 8.2.1] and, thus, the integral in (13) is well-defined [42, §7.7]. Similarly, for any , the inputs may fail to be Borel measurable but are only universally measurable [42, Proposition 7.50]. However, for any Borel measure there exists a Borel map so that -a.e.[42, Lemma 7.27]. Thus, we can without loss of generality assume that is Borel and, thus, the pushforward map in (17) and the resulting probability measures are well-defined.
5 Examples and Pitfalls
In Section 5.1, we present two examples where two marginals are enough, in line with the existing literature [1, 2]. Then, in Section 5.2, we showcase that, in general, the multi-marginal formulation is necessary. Finally, Section 5.3 shows that our results do not readily extend to noisy dynamics.
5.1 Examples when two marginals are all you need
We start with an example to which Corollary 4.3 applies:
Example 5.1** (Integrator particle dynamics, input effort).**
Suppose we aim at steering a probability measure to a target in steps; i.e., . The input space is , and the dynamics are . The costs are , and if and otherwise, so that the stage cost in the probability space is , and the terminal cost is if and otherwise. The optimal control problem in the ground space admits the solution , with the associated cost-to-go . By Corollary 4.3, the cost-to-go in the space of probability measures is
[TABLE]
*and the optimal input reads , where is the optimal transport plan for . In the particular case where an optimal transport map exists, the optimal input simplifies to . That is, all particles having state apply the input . *
Sometimes, the optimal input is probabilistic:
Example 5.2** (Sometimes it is necessary to split the mass).**
Let , and consider , , , . Let and . For every pair the solution in the ground space is , which yields the cost-to-go . That is, any allocation is optimal; in particular, the only feasible plan displaces 50% of mass to and the other 50% of the mass to ; see Figure 1(a). Then, the optimal input reads : of the particles apply the input , and the others .
5.2 Why all these marginals?
Hereby, we explore the differences between Theorem 4.2 and Corollary 4.3. Specifically, we clarify why a multi-marginal optimal transport formulation arises, even when the target probability measure remains constant throughout the horizon (i.e., ).
{counterexample}
[Two marginals are not enough]
Consider, as in Example 3.5, , dynamics , with horizon , and costs and , so that the stage and terminal cost in the probability space are the squared (type 2) Wasserstein distance from the fixed reference measure . First, we utilize Corollary 4.3, keeping the reference constant throughout the horizon. The cost-to-go (here and below, this notation means ) and , both obtained applying at the first stage (and subsequently any input). The cost-to-go for the fleet is , with the particle having state allocated to . However, from a fleet perspective, the input leads to . By changing allocations throughout the horizon, we obtain a total cost . This behavior emerges naturally with Theorem 4.2. The cost-to-go in the ground space satisfies , with the input at all times. Then, the transport plan yields
[TABLE]
necessarily optimal; see Figure 1(b). In particular, . That is, Corollary 4.3 does not apply and the optimal solution results from Theorem 4.2.
5.3 The effect of local noise
When the particle dynamics are noisy, it is common to minimize the expected particle cost via the stochastic DPA:
[TABLE]
where is the probability measure of the noise, and is the space of possible realizations. Since is of the form required for Corollary 4.3, it is tempting to extend our results. Unfortunately, the noisy drift may favor a different allocation of the particles, and the expectation annihilates such effect:
{counterexample}
[Corollary 4.3 does not readily extend] Consider a horizon and the setting depicted in Figure 1(c). Let , and consider uniformly distributed noise over . The particle dynamics is and . The stage cost is , where and [math] otherwise. The terminal cost enforces the configuration , namely with if and otherwise. The recursion Equation 18 yields (any input at the first stage and at the second stage), , and (all with analogous transitions). Therefore, with the initial configuration and target configuration , Corollary 4.3 yields . Instead, the DPA in the probability space gives with zero cost (regardless of the input). Then, the evolution is deterministic and the cost-to-go amounts to and . Thus, Corollary 4.3 applies and yields . Overall, . Thus, the naive application of Corollary 4.3 is suboptimal.
6 Proof of Theorems 4.2 and 4.3
For the proof Theorems 4.2 and 4.3, we need a few preliminary lemmata. For ease of notation, let , , and . To start, we introduce a variation of (8), in which only the first marginals are fixed. Namely,
[TABLE]
where is the transportation cost. When (i.e., there are no free marginals), we conveniently write . Further, given a collection of maps , we denote by the map defined point-wise as . Given the probability measures , , we conveniently write . A measure-valued map is Borel if and only if, for any Borel set , the map is Borel.
In our setting, the cost-to-go will be an optimal transport discrepancy, and the dynamics are a pushforward. To relate the cost-to-go at the stage to the one at the previous time step, we rigorously formalize their interplay. A similar but less general result (i.e., only with two fixed marginals) was derived in the context of uncertainty propagation via optimal transport [43].
Lemma 6.1** (Pushforward and optimal transport).**
Given a transportation cost , , maps , , and probability measures , , it holds:
[TABLE]
Proof 6.2**.**
We prove “” and “” separately. We start with “”. For any such that , let For consider . It holds:
[TABLE]
That is, and, thus, . Similarly, for all we have . Therefore, provides the upper bound Since is arbitrary, we obtain .
To prove “”, fix with . By definition, Then, for all , let Analogously to the previous step, we have . We can “glue” and to obtain such that Specifically, we apply times [9, Gluing lemma] as follows. First, we glue and , since they share a marginal: Call the resulting plan Next, define inductively as the plan obtained from gluing and , for . The definition is well-posed in view of [9, Gluing lemma], since Finally, we take , so that Let , , and . Then, for the argument of ,
[TABLE]
where in we used the disintegration theorem ([8, Theorem 5.3.1]), which provides us a collection to complement . Then, in , we used the definition of : - a.e.. Repeating the same steps for the other arguments of we obtain . Since is arbitrary, it follows .
The next result express the sum of two optimal transport discrepancies, possibly with free marginals, as a single optimal transport discrepancy, with the same free marginals. Similar results provide multi-marginal reformulations for Wasserstein barycenters [44, 45], whose computation has recently received much interest [46, 47].
Lemma 6.3** (Sum of optimal transport discrepancies).**
Given transportation costs , , and probability measures , , , it holds with defined as
Proof 6.4**.**
We prove separately “” and “”. With the short-hand notation , “” follows minimizing separately over the shared marginal:
[TABLE]
where in (i) we noticed that the first infimum is only over , and (ii) in the second infimum we used Lemma 6.1 with the pushforward map being .
*We now prove “”. For all , consider -optimal and so that ; i.e., and Since , we can glue them [9, Gluing lemma] to obtain Then, it holds *
[TABLE]
and, thus, . Let to conclude.
In particular, when is an expected value, the composition simplifies:
Lemma 6.5** (Compositionality of optimal transport).**
Given a cost , a transportation cost , a map , and probability measures , it holds
Proof 6.6**.**
The statement is a special case of Lemma 6.1.
Finally, we give a useful disintegration property of the cost term :
Lemma 6.7** (Disintegration of the optimizer).**
Given a transportation cost and probability measures , , it holds:
[TABLE]
where
Proof 6.8**.**
We prove “” and “” separately. To prove “”, consider any such that . By [8, Theorem 5.3.1], there exists such that
[TABLE]
Then, take the infimum over .
To prove “”, we follow [8, §5.3] to construct the reverse of the disintegration. Given any , and any . Then, we can construct a Borel probability measure defined for every Borel set as For , we have
[TABLE]
Thus, . Therefore,
[TABLE]
and the claim follows taking the infimum over and .
We are now ready to prove Theorems 4.2 and 4.3:
Proof 6.9** (Proof of Theorem 4.2).**
We prove the statements separately. To ease the notation, we recall , and we introduce
[TABLE]
- (i)
We proceed by induction. The base case is and . For , suppose . Then, the backward recursion gives
[TABLE]
where first, in , we used the definition of (see (19)), Lemma 6.1, and Lemma 6.3. Second, in , we used Lemma 6.7. Third, requires proving separately “” and “”. Let . Then, , and reveal “”.
To prove “”, let and . By definition, we can restrict the integration domain to the support of , for which it holds . We thus consider in place of as the integration domain. For all , consider the collection . Without loss of generality, we assume that is Borel; see the discussion in Section 4. As a consequence, also the measure-valued map is Borel. To show this, we can equivalently show that, for every Borel, the pre-image of the intervals , for , of is Borel. Let . Then, if , if , and otherwise. In all cases, is Borel set, and, thus, the map is Borel. Since the composition of Borel maps is a Borel map, is Borel. Therefore, the measure-valued map is Borel. Then, , with as in Lemma 6.7. Thus,
[TABLE]
Take the infimum over on both sides and let to prove “”.
Next, it holds For “”, let yield , and consider . For all we have
[TABLE]
The limit reveals “” and, thus, the equality. Thus, for every , we have , and so .
This proves (13). Finally, analogously to the traditional DPA **[7, 32]**, the additivity of the cost structure yields . 2. (ii)
Let and define , and as in the theorem statement. Consider the (possibly sub-optimal) plan
[TABLE]
By definition, and . Therefore, is a valid choice for the infimum, and it holds:
[TABLE]
where, in , we used the definition of (see (19)), Lemma 6.1, and Lemma 6.3. Overall, is an -optimal control input at . When , the infima are attained and we obtain the optimal state-input distribution . 3. (iii)
The statement follows from (ii), plugging in the given maps and .
Proof 6.10** (Proof of Corollary 4.3).**
The proof is analogous to Theorem 4.2. To express the cost-to-go as a two-marginals optimal transport discrepancy, it suffices to replace Lemma 6.3 with Lemma 6.5. The simplified optimal control input follows.
7 Conclusions
We showed that many discrete-time finite-horizon optimal control problems in probability spaces are multi-marginal optimal transport problems, whose transportation cost stems from an optimal control problem in the space on which the probability measures are defined. This implies a separation principle: The optimal control strategy for a fleet of identical agents results from the optimal control strategy of each agent (how to go from to ?) and an optimal transport problem (who goes from to ?). We complemented our theoretical results with various examples. Among others, our results back up many existing approaches in the literature which a-priori formalize the distribution/fleet steering problems as an optimal transport problem and not as an optimal control problem in the probability space. Our analysis bases on novel stability results for the multi-marginal optimal transport problem, whose study is of independent interest.
Future work will explore extensions to noisy dynamics and different cost functionals, the limit cases of the infinite horizon and continuous-time dynamics, and the practical impact of our theoretical results.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Mathias Hudoba de Badyn, Erik Miehling, Dylan Janak, Behçet Açkmeşe, Mehran Mesbahi, Tamer Başar, John Lygeros, and Roy S Smith. Discrete-time linear-quadratic regulation via optimal transport. In 60th Conference on Decision and Control , pages 3060–3065, 2021.
- 2[2] Vishaal Krishnan and Sonia Martínez. Distributed online optimization for multi-agent optimal transport. ar Xiv preprint ar Xiv:1804.01572 , 2019.
- 3[3] Antonio Terpin, Sylvain Fricker, Michel Perez, Mathias Hudoba de Badyn, and Florian Dörfler. Distributed feedback optimisation for robotic coordination. In 2022 American Control Conference , pages 3710–3715, 2022.
- 4[4] Gioele Zardini, Nicolas Lanzetti, Marco Pavone, and Emilio Frazzoli. Analysis and control of autonomous mobility-on-demand systems. Annual Review of Control, Robotics, and Autonomous Systems , 5(1), 2021.
- 5[5] Giacomo Albi, Lorenzo Pareschi, and Mattia Zanella. On the optimal control of opinion dynamics on evolving networks. In IFIP Conference on System Modeling and Optimization , pages 58–67. Springer, 2015.
- 6[6] Elizabeth Y Huang, Dario Paccagnan, Wenjun Mei, and Francesco Bullo. Assign and appraise: Achieving optimal performance in collaborative teams. IEEE Transactions on Automatic Control , 2022.
- 7[7] Dimitri Bertsekas. Abstract dynamic programming . Athena Scientific, 2022.
- 8[8] L. Ambrosio, N. Gigli, and G. Savaré. Gradient Flows: In Metric Spaces and in the Space of Probability Measures . Birkhäuser Basel, 1 edition, 2008.
