Optimal steering for non-Markovian Gaussian processes

Daniele Alpago; Yongxin Chen; Tryphon Georgiou; Michele Pavon

arXiv:1903.00525·math.OC·March 5, 2019

Optimal steering for non-Markovian Gaussian processes

Daniele Alpago, Yongxin Chen, Tryphon Georgiou, Michele Pavon

PDF

Open Access

TL;DR

This paper derives a closed-form optimal control law for steering a non-Markovian Gaussian process with a finite-dimensional Markov realization to a desired terminal distribution, minimizing energy over a finite horizon.

Contribution

It provides the first closed-form solution for finite-energy steering of a non-Markovian process with a Markov realization, advancing control of partially observable stochastic systems.

Findings

01

Closed-form optimal control law derived.

02

Applicable to non-Markovian processes with Markov realizations.

03

Progress towards controlling systems with partial, noisy observations.

Abstract

At present, the problem to steer a non-Markovian process with minimum energy between specified end-point marginal distributions remains unsolved. Herein, we consider the special case for a non-Markovian process y(t) which, however, assumes a finite-dimensional stochastic realization with a Markov state process that is fully observable. In this setting, and over a finite time horizon [0,T], we determine an optimal (least) finite-energy control law that steers the stochastic system to a final distribution that is compatible with a specified distribution for the terminal output process y(T); the solution is given in closed-form. This work provides a key step towards the important problem to steer a stochastic system based on partial observations of the state (i.e., an output process) corrupted by noise, which will be the subject of forthcoming work.

Equations98

d x^{u} = A (t) x^{u} (t) d t + B (t) u (t) + B (t) d w (t),

d x^{u} = A (t) x^{u} (t) d t + B (t) u (t) + B (t) d w (t),

x^{u} (0) = ξ \mbox a . s .

ρ_{0} (x) = (2 π)^{- n /2} det (Σ_{0}^{x})^{- 1/2} exp {- \frac{1}{2} x^{'} (Σ_{0}^{x})^{- 1} x} .

y (t) = C (t) x^{u} (t),

y (t) = C (t) x^{u} (t),

u^{*} := u \in U (Σ_{0}^{x}, Σ_{T}^{y}) argmin J (u) := E {\int_{0}^{T} u (t)^{'} u (t) d t} .

u^{*} := u \in U (Σ_{0}^{x}, Σ_{T}^{y}) argmin J (u) := E {\int_{0}^{T} u (t)^{'} u (t) d t} .

u^{⋆} (t) = - B (t)^{'} Q (t)^{- 1} x (t),

u^{⋆} (t) = - B (t)^{'} Q (t)^{- 1} x (t),

\dot{P} (t)

\dot{P} (t)

\dot{Q} (t)

(Σ_{0}^{x})^{- 1}

(Σ_{0}^{x})^{- 1}

(Σ_{T}^{x})^{- 1}

D (π_{u} ∥ π_{0}) := \int\int [lo g \frac{π _{u} ( x , y )}{π _{0} ( x , y )}] π_{u} (x, y) d x d y

D (π_{u} ∥ π_{0}) := \int\int [lo g \frac{π _{u} ( x , y )}{π _{0} ( x , y )}] π_{u} (x, y) d x d y

\int π_{u} (x, y) d y = ρ_{0} (x), \int π_{u} (x, y) d x = ρ_{T} (y) .

\int π_{u} (x, y) d y = ρ_{0} (x), \int π_{u} (x, y) d x = ρ_{T} (y) .

\Sigma^{u}_{0,T}=\left[\begin{array}[]{cc}\Sigma^{x}_{0}&Y^{u}\\ (Y^{u})^{\prime}&\Sigma^{x}_{T}\end{array}\right]

\Sigma^{u}_{0,T}=\left[\begin{array}[]{cc}\Sigma^{x}_{0}&Y^{u}\\ (Y^{u})^{\prime}&\Sigma^{x}_{T}\end{array}\right]

S_{0, T} = [Σ_{0}^{x} Φ (T, 0) Σ_{0}^{x} Σ_{0}^{x} Φ (T, 0)^{'} S_{T}]

S_{0, T} = [Σ_{0}^{x} Φ (T, 0) Σ_{0}^{x} Σ_{0}^{x} Φ (T, 0)^{'} S_{T}]

S_{T} = Φ (T, 0) Σ_{0}^{x} Φ (T, 0)^{'} + \int_{0}^{T} Φ (T, τ) B (τ) B (τ)^{'} Φ (T, τ)^{'} d τ,

S_{T} = Φ (T, 0) Σ_{0}^{x} Φ (T, 0)^{'} + \int_{0}^{T} Φ (T, τ) B (τ) B (τ)^{'} Φ (T, τ)^{'} d τ,

\frac{\partial}{\partial t} Φ (t, s) = A (t) Φ (t, s), Φ (t, t) = I .

\frac{\partial}{\partial t} Φ (t, s) = A (t) Φ (t, s), Φ (t, t) = I .

(Y^{u}) \in Q^{x} argmin - lo g det Σ_{0, T}^{u} + trace (S_{0, T}^{- 1} Σ_{0, T}^{u})

(Y^{u}) \in Q^{x} argmin - lo g det Σ_{0, T}^{u} + trace (S_{0, T}^{- 1} Σ_{0, T}^{u})

Q^{x} := {Y \in R^{n \times n} : Σ_{T}^{x} - Y^{'} (Σ_{0}^{x})^{- 1} Y > 0},

Q^{x} := {Y \in R^{n \times n} : Σ_{T}^{x} - Y^{'} (Σ_{0}^{x})^{- 1} Y > 0},

u^{*} := u \in U (Σ_{0}^{x}, Σ_{T}^{y}) argmin J (u) := E {\int_{0}^{T} u (t)^{'} u (t) d t} .

u^{*} := u \in U (Σ_{0}^{x}, Σ_{T}^{y}) argmin J (u) := E {\int_{0}^{T} u (t)^{'} u (t) d t} .

Σ_{T}^{y} = C (T) Σ_{T}^{x} C (T)^{'} .

Σ_{T}^{y} = C (T) Σ_{T}^{x} C (T)^{'} .

u \in U argmin

u \in U argmin

x (0) \sim N (0, Σ_{0}^{x}), x (T) \sim N (0, X),

C X C^{'} = Σ_{T}^{y},

(X, Y) \in Q argmin

(X, Y) \in Q argmin

S^{- 1} = [N V^{'} V P],

S^{- 1} = [N V^{'} V P],

Q := {(X, Y) \in S_{+} \times R^{n \times n} : X - Y^{'} (Σ_{0}^{x})^{- 1} Y > 0}

Q := {(X, Y) \in S_{+} \times R^{n \times n} : X - Y^{'} (Σ_{0}^{x})^{- 1} Y > 0}

(X, Y) \in Q in f L (X, Y, M) .

(X, Y) \in Q in f L (X, Y, M) .

L (X, Y, M)

L (X, Y, M)

+ trace [M (C X C^{'} - Σ_{T}^{y})] + 2 trace (V^{'} Y) + c,

δ L =

δ L =

+ trace [(P + C^{'} M C) δ X + 2 V^{'} δ Y]

=

+ trace [(P + C^{'} M C) δ X + 2 V^{'} δ Y] .

δ^{2} L := δ L (X, Y, M; δ X, δ X, δ Y, δ Y) .

δ^{2} L := δ L (X, Y, M; δ X, δ X, δ Y, δ Y) .

δ^{2} L =

δ^{2} L =

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Stochastic processes and financial applications · Advanced Thermodynamics and Statistical Mechanics

Full text

Optimal steering for non-Markovian Gaussian processes

Daniele Alpago, Yongxin Chen, Tryphon Georgiou and Michele Pavon D. Alpago is with the Dipartimento di Ingegneria dell’Informazione, Università di Padova, 35131 Padova, Italy; [email protected]. Chen is with the School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332;[email protected]. Georgiou is with the Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA 92697; [email protected]. Pavon is with the Dipartimento di Matematica “Tullio Levi-Civita”, Università di Padova, 35121 Padova, Italy; [email protected] Supported in part by the NSF under grants 1509387, 1901599, the AFOSR under grants FA9550-15-1-0045 and FA9550-17-1-0435, and by the University of Padova Research Project CPDA 140897.

Abstract

At present, the problem to steer a non-Markovian process with minimum energy between specified end-point marginal distributions remains unsolved. Herein, we consider the special case for a non-Markovian process $y(t)$ which, however, assumes a finite-dimensional stochastic realization with a Markov state process that is fully observable. In this setting, and over a finite time horizon $[0,T]$ , we determine an optimal (least) finite-energy control law that steers the stochastic system to a final distribution that is compatible with a specified distribution for the terminal output process $y(T)$ ; the solution is given in closed-form. This work provides a key step towards the important problem to steer a stochastic system based on partial observations of the state (i.e., an output process) corrupted by noise, which will be the subject of forthcoming work.

I Introduction

Throughout we will be considering a controlled evolution of the vector Gauss-Markov process $\{x(t)\mid 0\leq t\leq T\}$ that obeys the linear stochastic differential equation

[TABLE]

For this setting, in recent years, there has been considerable interest in the problem of minimum-energy steering of the (Gaussian) distribution of $x(t)$ to a target distribution $\mathcal{N}(0,\Sigma_{T}^{x})$ at time $t=T$ , [6, 7, 19, 17, 2]. Important extensions include [7] the more challenging case when the control process $u$ and the noise $w$ enter through different channels (i.e., having different “input matrices” $B$ in (1a)), and the infinite-horizon case where the goal is to achieve with minimum power a specified stationary state [7]; the latter generalizes the classical work on covariance control of Skelton et al. [20, 18]. Motivation for such problems is manifold: they represents a most natural relaxation of classical LQR steering problems and have important applications in quality control and industrial manufacturing, vehicle path planning [27], statistical physics as in cooling and control of nano-to-meter scale resonators, atomic force microscopy and so forth, see e.g., [14, 8].

Historically, the origin of the steering problem stems from a Gedankenexperiment formulated by Schrödinger in the early thirties [31, 32], seeking the most likely flow of particle distributions between observed end-point marginals. Schrödinger’s problem amounted to a problem in the theory of large deviations (which was unavailable at that time). Indeed, thanks to Sanov’s theorem [30], the Schrödinger’s problem amounts to seeking a probability distribution on particle trajectories having maximum entropy andwhich is in agreement with the end-point specified marginal distributions [16, 3, 21, 15, 35]. Then, in the late eighties and early nineties, following work of Jamison, Föllmer, Nagasawa, Wakolbinger, Fleming, Holland, Mitter and others, a clear connection was made with stochastic control [12, 13, 28]. The distribution on paths, corresponding to the uncontrolled evolution, plays the role of the “prior” measure in the maximum entropy problem which generalizes Schrödinger’s original one. At about the same time, Blaquiere [4] studied the control of the Fokker-Planck equation and later Brockett studied the Louiville equation [5] along a similar spirit, to steer distributions to a target one. This circle of control problems for uncertain system has recently been linked to yet another fast developing topic, Optimal Mass Transport (OMT) problem [34], when it was realized that Schrödinger’s bridge problem (SBP, as it seeks to “bridge” the two end-point marginals) may be viewed as a regularization of OMT and provides an effective computational approach to the latter [24, 25, 26, 22, 10].

Extending the Schrödinger problem to the case of non-Markov processes is a tantalizing one and a natural next step. While the general case is currently wide open, in the present paper we work out the special of steering the output of a Gauss-Markov model. More specifically, in conjunction with (1a), we consider the output process

[TABLE]

where $C(\cdot)$ is continuous and takes values in ${\mathbb{R}}^{p\times n}$ for $p<n$ . For instance, this case arises when we consider steering only some components of the state to a prescribed terminal distribution (see V). Clearly, $y$ by itself is not a Markov process. Thus, this seemingly innocuous problem falls into the category of Schrödinger bridge problems with non-Markov prior for which the form of the optimal control is, in general, unknown111See [29] for a considerably simpler “half-bridge” problem where only the final distribution is prescribed.. Problems where only a portion of the state needs to be specified arise, for instance, in thickness control (film extrusion) [1, 2] where the remaining components of the state vector might either not be of interest or may be difficult/expensive to measure. In Section V we discuss a case where it is of interest to regulate only the distribution of the momentum of stochastic oscillators.

The outline of the paper is as follows. In Section II, we recall some central results from [6] in the case of a Markovian prior. In Section III, we give a precise formulation of our stochastic control problem. In Section IV, we provide a closed-form solution to our problem by finding the terminal time state covariance which can be reached with minimum energy among those complying with the assigned covariance of $y(T)$ . Finally, Section V illustrates the results in a problem of steering the momentum distribution of a stochastic oscillator to a desired one.

II Background

Let $\mathcal{U}(\Sigma^{x}_{0},\Sigma^{x}_{T})$ be the family of adapted222 $u(t)$ only depends on $t$ and on $\{x^{u}(s);0\leq s\leq t\}$ for each $t\in[0,T]$ ., finite energy control functions such that (1a) has a strong solution on $[0,T]$ and $x(T)$ has distribution $\mathcal{N}(0,\Sigma_{T}^{x})$ . The optimal steering problem reads

Problem 1

Determine

[TABLE]

In [6, Theorem 8], it was shown that, under controllability of the pair $(A(\cdot),B(\cdot))$ on the given time interval, $\mathcal{U}(\Sigma^{x}_{0},\Sigma^{x}_{T})$ is nonempty and the (unique) optimal control is a linear feedback of the state given by

[TABLE]

where $P(t)$ and $Q(t)$ , taking values in the set of symmetric, $n\times n$ matrices, are the unique nonsingular solutions on $[0,T]$ of the system of linear matrix equations

[TABLE]

nonlinearly coupled through the boundary conditions

[TABLE]

The solutions to these equations can actually be provided in closed form as a function of $(\Sigma^{x}_{0},\Sigma^{x}_{T})$ , see [6, Section III] for further details.

Let $P_{0}$ and $P_{u}$ be the probability measures on $C(0,T;{\mathbb{R}}^{n})$ , the $n$ -dimensional continuous functions corresponding to the solutions of (1a) with control [math], and $u\in\mathcal{U}$ , respectively. Also let $\pi_{0}(x_{0},x_{T})$ and $\pi_{u}(x_{0},x_{T})$ be their initial-final joint density, respectively. In [6, Section IV], a well known decomposition of the relative entropy [15] was extended to the case of degenerate diffusions, to show that the Schrödinger bridge problem with marginals densities $\rho_{0}=\mathcal{N}(0,\Sigma_{0}^{x})$ and $\rho_{T}=\mathcal{N}(0,\Sigma_{T}^{x})$ can be reduced to the following maximum entropy problem for distributions on a finite-dimensional space:

Problem 2

Minimize over densities $\pi_{u}$ on ${\mathbb{R}}^{n}\times{\mathbb{R}}^{n}$ the Kullback-Leibler index

[TABLE]

subject to the (linear) constraints

[TABLE]

Let $\Sigma^{u}_{0,T}$ be the covariance of $\pi_{u}(x_{0},x_{T})$ . Since $u\in\mathcal{U}(\Sigma^{x}_{0},\Sigma^{x}_{T})$ , $\Sigma^{u}_{0,T}$ has necessarily the structure

[TABLE]

for some $Y^{u}$ . Let $S_{0,T}$ instead be the covariance corresponding to $\pi_{0}(x_{0},x_{T})$ . Then, it has the form

[TABLE]

where

[TABLE]

with $\Phi(t,s)$ denoting the state-transition of $A(\cdot)$ determined by

[TABLE]

Thanks to the explicit form of relative entropy (Kullback-Leibler index) for Gaussian distributions [11], Problem 2 can be expressed in terms of covariances as follows:

[TABLE]

where $\Sigma^{u}_{0,T}$ is as in (7) and

[TABLE]

see [6, Section IV] for the details.

III Problem formulation

We consider the output process in (1c) and assume that the state $x(t)$ is fully observable. The finite-dimensional Markovian representation (stochastic realization) for $y$ provided by (1a)-(1c) is available. Such a representation, as is well-known, constitutes the starting point of Kalman filtering and much of optimal control theory, and the construction of such a model with minimal state vector dimension has been the subject of intense study [23]. This too is our starting point.

Let us denote by $\mathcal{U}(\Sigma^{x}_{0},\Sigma^{y}_{T})$ be the family of adapted control functions such that (1a) has a strong solution on $[0,T]$ and $y(T)$ has distribution $\mathcal{N}(0,\Sigma_{T}^{y})$ . We formulate the following Schrödinger Bridge Problem with non-Markov prior:

Problem 3

Determine

[TABLE]

Notice that on one side, at $t=0$ , the boundary constraint requires matching the covariance for the state vector (which can be relaxed) while on the other end, at $t=T$ , requires matching the covariance of the output

[TABLE]

The value of $\Sigma_{T}^{x}$ is a parameter and there are in general several values for it such that (10) is satisfied333The case where only $\Sigma^{y}_{0}$ and $\Sigma^{y}_{T}$ are prescribed can be treated in a similar fashion by optimizing also with respect to $\Sigma_{0}^{x}$ .. Corresponding to each one of them, there is a feedback control in $\mathcal{U}(\Sigma^{x}_{0},\Sigma^{x}_{T})$ optimally performing the transfer of distributions according to [6]. Thus, the problem may be also viewed as that of determining the one final covariance $\Sigma_{T}^{x}$ , among those compatible with $\Sigma_{T}^{y}$ , whose corresponding optimal control (2) has minimum energy.

Inspired by the reduction of the classical case leading to Problem 2, we proceed in the next section to derive a closed-form solution of Problem 3.

IV Solution to the non-Markovian steering problem

In view of (9) in Section II, Problem 3 can be rewritten as

[TABLE]

where $\Sigma^{x}_{0}$ , $\Sigma_{T}^{y}$ constitute the given data while $X$ is a parameter. This can be further recast as

Problem 4

Given $\Sigma^{x}_{0}$ , $\Sigma_{T}^{y}$ , and $S=S_{0,T}$ as in (8), determine

[TABLE]

subject to $\Sigma=\begin{bmatrix}\Sigma^{x}_{0}&Y\\ Y^{\prime}&X\end{bmatrix}>0$ and $CXC^{\prime}=\Sigma_{T}^{y}$ .

Now, let

[TABLE]

and

[TABLE]

where ${\mathcal{S}}_{+}$ is the set of $n\times n$ symmetric positive definite matrices.

We construct below the Lagrangian $\mathcal{L}$ introducing a Lagrange multiplier and consider the unconstrained minimization

[TABLE]

The Lagrangian is given by (we write, for simplicity, $\Sigma_{0}$ instead of $\Sigma_{0}^{x}$ )

[TABLE]

where $M=M^{\prime}$ is a Lagrange multiplier and $c\in{\mathbb{R}}$ is a constant term. We first check the convexity of $\mathcal{L}$ with respect to $(X,Y)$ .

Proposition 1

$\mathcal{L}$ * is jointly convex in $(X,Y)$ over ${\mathcal{Q}}$ .*

Proof:

Let $\delta\mathcal{L}:=\delta\mathcal{L}(X,Y,M;\delta X,\delta Y)$ denoting the first variation of $\mathcal{L}$ in the direction $(\delta X,\delta Y)$ . Applying the chain rule,

[TABLE]

To check the convexity it is sufficient look at the diagonal of the “Hessian” of $\mathcal{L}$

[TABLE]

We have

[TABLE]

which is clearly non-negative on ${\mathcal{Q}}$ . ∎

To find the minimum of $\mathcal{L}$ in ${\mathcal{Q}}$ is therefore sufficient to solve

[TABLE]

from which we get the two equations

[TABLE]

To compute the optimal $(X,Y)$ , we use these equations in the Lagrangian and then proceed to maximize the resulting (concave) functional with respect to $M$ . Accordingly, the last equation we need is given by

[TABLE]

Let $Z:=X-Y^{\prime}\Sigma_{0}^{-1}Y$ and note that $Z=Z^{\prime}>0$ . We immediately get $X=Z+Y^{\prime}\Sigma_{0}^{-1}Y$ and

[TABLE]

Therefore, $X=Z+ZV^{\prime}\Sigma_{0}VZ$ and

[TABLE]

At this point we only need to find $Z$ from equations (17), (18). Since we can always find a state space transformation $\mathcal{T}$ such that $C\,\mathcal{T}=[I\,|\,0]$ (or a change of basis in the outputs’ space), without loss of generality, we can always assume that $C=[I\,|\,0]$ . Let

[TABLE]

Equation (18) becomes

[TABLE]

while equation (17) can be equivalently written as

[TABLE]

which reduces to the system of equations

[TABLE]

Plugging $Z_{12}$ , $Z_{21}$ and $Z_{22}$ into (19), we get

[TABLE]

where

[TABLE]

Equation (22) is a quadratic equation with two solutions

[TABLE]

Clearly, $Z=X-Y^{\prime}\Sigma_{0}^{-1}Y>0$ by Schur complement, which implies $Z_{11}>0$ . This singles out the solution $Z_{11}^{+}$ . We can now recover $Z$ from (21) and then $X=Z+ZV^{\prime}\Sigma_{0}VZ$ and $Y=-\Sigma_{0}VZ$ . Finally, from (20), one can find the multiplier $M$ :

[TABLE]

The above results can be summarized as follows.

Theorem 5

Let $Z_{11}^{+}$ be as in (23) and $Z,X,Y$ be derived accordingly, then $(X,Y)$ solves Problem 4. Furthermore, the solution to Problem 3 coincides with the solution to Problem 1 with $\Sigma_{T}^{x}=X$ .

V Example

Consider controlling the Ornstein-Uhlenbeck model of physical Brownian motion

[TABLE]

corresponding to a given quadratic potential $V(q)=\frac{1}{2}q^{\prime}Kq$ with $K$ symmetric, positive-definite, and $u(\cdot)$ is the control force. By setting

[TABLE]

model (24) becomes

[TABLE]

where $\xi$ is zero-mean Gaussian with $\Sigma_{0}^{x}=I/2$ , and the pair $(A,B)$ is controllable. We consider a state dimension of $n=2$ and we assume for simplicity that the units are such that $K=I$ and $\beta=1$ .

We would like to steer the Gaussian distribution of the momentum equal to a final distribution at time $T=1$ with $\Sigma_{1}^{p}=1/16$ minimizing the quadratic control energy under the controlled dynamics (24). In other words, we are prescribing only the final covariance matrix of $y(t)=C\,x(t)$ with $C=\left[0\,|\,I\right]$ . Figure 1 shows the trajectories of the state variables in the phase space (left) and the corresponding control efforts (right), i.e. the intersections of the phase plot with the slice planes $p$ and $q$ respectively.

Figure 2 highlights instead the trajectories of position (left) and momentum (right) with the corresponding confidence interval.

In all the figures, the transparent blue tube represent the ” $3\sigma$ ” confidence interval, i.e. its intersection with the slice plane $t$ is given by

[TABLE]

The figures highlight the reduction of the variance of the momentum process as time increases to $T=1$ .

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. J. Åström, Introduction to Stochastic Control Theorey , Academic Press, 1970.
2[2] Efstathios Bakolas, Finite-horizon covariance control for discrete-time stochastic linear systems subject to input constraints, Automatica , 91 , pp. 61-68, 2018.
3[3] A. Beurling, An automorphism of product measures, Ann. Math. 72 (1960), 189-200.
4[4] A. Blaquière, “Controllability of a Fokker-Planck equation, the Schrödinger system, and a related stochastic optimal control (revised version),” Dynamics and Control , vol. 2, no. 3, pp. 235–253, 1992.
5[5] R. Brockett, “Notes on the control of the Liouville equation,” in Control of Partial Differential Equations . Springer, 2012, pp. 101–129.
6[6] Y. Chen, T.T. Georgiou and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, Part I”, IEEE Trans. Aut. Control , 61 , Issue 5, 1158-1169, 2016.
7[7] Y. Chen, T.T. Georgiou and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, Part II”, IEEE Trans. Aut. Control , 61 , Issue 5, 1170-1180, 2016.
8[8] Y. Chen, T.T. Georgiou and M. Pavon, “Fast cooling for a system of stochastic oscillators”, J. Math. Phys. , 56 , n.11, 113302, 2015.