Stochastic Model Predictive Control: Output-Feedback, Duality and   Guaranteed Performance

Martin A Sehr; Robert R Bitmead

arXiv:1706.00733·math.OC·May 1, 2020

Stochastic Model Predictive Control: Output-Feedback, Duality and Guaranteed Performance

Martin A Sehr, Robert R Bitmead

PDF

TL;DR

This paper introduces a new formulation for stochastic model predictive output feedback control that naturally incorporates output feedback, dual control, and guarantees near-infinite horizon optimality, despite computational challenges.

Contribution

It translates stochastic optimal output feedback control into a receding horizon framework, extending to POMDPs and demonstrating applicability in healthcare decision making.

Findings

01

Incorporates output feedback naturally

02

Dual regulation and probing are inherent in the control

03

Guarantees closed-loop performance relative to infinite-horizon optimal control

Abstract

A new formulation of Stochastic Model Predictive Output Feedback Control is presented and analyzed as a translation of Stochastic Optimal Output Feedback Control into a receding horizon setting. This requires lifting the design into a framework involving propagation of the conditional state density, the information state, via the Bayesian Filter and solution of the Stochastic Dynamic Programming Equation for an optimal feedback policy, both stages of which are computationally challenging in the general, nonlinear setup. The upside is that the clearance of three bottleneck aspects of Model Predictive Control is connate to the optimality: output feedback is incorporated naturally; dual regulation and probing of the control signal is inherent; closed-loop performance relative to infinite-horizon optimal control is guaranteed. While the methods are numerically formidable, our aim is to…

Tables1

Table 1. Table 1: Problem data for healthcare decision making example.

Decision $a$	Transition Probabilities $P (a)$	Observation Probabilities $R (a)$	Cost $c (a)$
1: Skip next appointment slot	$[\begin{matrix} 0.80 & 0.20 & 0.00 \\ 0.00 & 0.90 & 0.10 \\ 0.00 & 0.00 & 1.00 \end{matrix}]$	$[\begin{matrix} 1 / 3 & 1 / 3 & 1 / 3 \\ 1 / 3 & 1 / 3 & 1 / 3 \\ 1 / 3 & 1 / 3 & 1 / 3 \end{matrix}]$	$[\begin{matrix} 0 \\ 5 \\ 5 \end{matrix}]$
2: Schedule new appointment	$[\begin{matrix} 0.80 & 0.20 & 0.00 \\ 0.00 & 0.90 & 0.10 \\ 0.00 & 0.00 & 1.00 \end{matrix}]$	$[\begin{matrix} 0.40 & 0.30 & 0.30 \\ 0.30 & 0.40 & 0.30 \\ 0.30 & 0.30 & 0.40 \end{matrix}]$	$[\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}]$
3: Order rapid diagnostic test	$[\begin{matrix} 1.00 & 0.00 & 0.00 \\ 0.00 & 1.00 & 0.00 \\ 0.00 & 0.00 & 1.00 \end{matrix}]$	$[\begin{matrix} 0.90 & 0.05 & 0.05 \\ 0.05 & 0.90 & 0.05 \\ 0.05 & 0.05 & 0.90 \end{matrix}]$	$[\begin{matrix} 4 \\ 3 \\ 4 \end{matrix}]$
4: Apply available treatment	$[\begin{matrix} 0.80 & 0.20 & 0.00 \\ 0.75 & 0.25 & 0.00 \\ 0.00 & 0.00 & 1.00 \end{matrix}]$	$[\begin{matrix} 0.40 & 0.30 & 0.30 \\ 0.30 & 0.40 & 0.30 \\ 0.30 & 0.30 & 0.40 \end{matrix}]$	$[\begin{matrix} 4 \\ 2 \\ 4 \end{matrix}]$

Equations147

x_{k + 1}

x_{k + 1}

y_{k}

π_{0 ∣ - 1}

π_{0 ∣ - 1}

ζ^{k}

ζ^{k}

π_{k}

π_{k}

π_{k}

π_{k}

π_{k + 1 ∣ k}

π_{k + 1}

π_{k + 1}

J_{N} (π_{0}, u^{N - 1}) ≜ E_{0} [j = 0 \sum N - 1 α^{j} c (x_{j}, u_{j}) + α^{N} c_{N} (x_{N})],

J_{N} (π_{0}, u^{N - 1}) ≜ E_{0} [j = 0 \sum N - 1 α^{j} c (x_{j}, u_{j}) + α^{N} c_{N} (x_{N})],

u_{k} = g_{k} (π_{k}) .

u_{k} = g_{k} (π_{k}) .

J_{N} (π_{0}, g^{N - 1}) = E_{0} [j = 0 \sum N - 1 α^{j} c (x_{j}, g_{j} (π_{j})) + α^{N} c_{N} (x_{N})] .

J_{N} (π_{0}, g^{N - 1}) = E_{0} [j = 0 \sum N - 1 α^{j} c (x_{j}, g_{j} (π_{j})) + α^{N} c_{N} (x_{N})] .

J_{\infty} (π_{0}, g^{\infty})

J_{\infty} (π_{0}, g^{\infty})

P_{k} [x_{k} \in X_{k}]

P_{k} [x_{k} \in X_{k}]

u_{k} = g_{k} (π_{k})

u_{k} = g_{k} (π_{k})

P_{k} [x_{k} \in X_{\infty}]

P_{k} [x_{k} \in X_{\infty}]

u_{k} = g_{k} (π_{k})

u_{k} = g_{k} (π_{k})

\!\!\!\!\mathcal{P}_{N}(\pi_{0}):\hfill\left\{\!\!\begin{array}[]{cl}\inf_{\mathbf{g}^{N-1}}&\!\!J_{N}(\pi_{0},\mathbf{g}^{N-1})\\[5.69046pt] \text{s.t.}&\!\!\mathbb{P}_{j}\left[x_{j}\in\mathcal{X}_{j}\right]\geq 1-\epsilon_{j},\,j=1,\dots,N.\\[2.84544pt] &\!\!g_{j}(\pi_{j})\in\mathcal{U}_{j},\,j=0,\dots,N-1.\end{array}\right.

\!\!\!\!\mathcal{P}_{N}(\pi_{0}):\hfill\left\{\!\!\begin{array}[]{cl}\inf_{\mathbf{g}^{N-1}}&\!\!J_{N}(\pi_{0},\mathbf{g}^{N-1})\\[5.69046pt] \text{s.t.}&\!\!\mathbb{P}_{j}\left[x_{j}\in\mathcal{X}_{j}\right]\geq 1-\epsilon_{j},\,j=1,\dots,N.\\[2.84544pt] &\!\!g_{j}(\pi_{j})\in\mathcal{U}_{j},\,j=0,\dots,N-1.\end{array}\right.

\displaystyle\mathcal{P}_{\infty}(\pi_{0}):\left\{\begin{array}[]{cl}\inf_{\mathbf{g}^{\infty}}&J_{\infty}(\pi_{0},\mathbf{g}^{\infty})\\[5.69046pt] \text{s.t.}&\mathbb{P}_{j}\left[x_{j}\in\mathcal{X}_{\infty}\right]\geq 1-\epsilon_{\infty},\,j\in\mathbb{N}_{1}.\\[2.84544pt] &g_{j}(\pi_{j})\in\mathcal{U}_{\infty},\,j\in\mathbb{N}_{0}.\end{array}\right.

\displaystyle\mathcal{P}_{\infty}(\pi_{0}):\left\{\begin{array}[]{cl}\inf_{\mathbf{g}^{\infty}}&J_{\infty}(\pi_{0},\mathbf{g}^{\infty})\\[5.69046pt] \text{s.t.}&\mathbb{P}_{j}\left[x_{j}\in\mathcal{X}_{\infty}\right]\geq 1-\epsilon_{\infty},\,j\in\mathbb{N}_{1}.\\[2.84544pt] &g_{j}(\pi_{j})\in\mathcal{U}_{\infty},\,j\in\mathbb{N}_{0}.\end{array}\right.

π_{k + 1} = T (π_{k}, y_{k + 1}, g_{k} (π_{k}))

π_{k + 1} = T (π_{k}, y_{k + 1}, g_{k} (π_{k}))

V_{k} (π_{k}) ≜ g_{k} (\cdot) in f

V_{k} (π_{k}) ≜ g_{k} (\cdot) in f

π_{k + 1} \in C_{k + 1}, (w_{k}, v_{k + 1}) - a.s.

g_{k} (π_{k}) \in U_{k}

V_{N} (π_{N})

V_{N} (π_{N})

C_{j + 1} \subseteq C_{j}

C_{j + 1} \subseteq C_{j}

\tilde{g} (π_{k})

\tilde{g} (π_{k})

T (π_{k}, y_{k + 1}, \tilde{g} (π_{k}))

c (x_{k}, \tilde{g} (π_{k}))

P_{k} [x_{k} \in X_{1}]

P_{k} [x_{k} \in X_{1}]

α E_{π} [c_{N} (f (x, \tilde{g} (π), w))] - c_{N} (x) \leq a.s. - c (x, \tilde{g} (π))

α E_{π} [c_{N} (f (x, \tilde{g} (π), w))] - c_{N} (x) \leq a.s. - c (x, \tilde{g} (π))

M \to \infty lim k = 0 \sum M α^{k} c (x_{k}, g_{0}^{⋆} (π_{k})) < a.s. \infty.

M \to \infty lim k = 0 \sum M α^{k} c (x_{k}, g_{0}^{⋆} (π_{k})) < a.s. \infty.

c (x_{k}, u_{k})

c (x_{k}, u_{k})

M \to \infty lim k = 0 \sum M c (x_{k}, g_{0}^{⋆} (π_{k})) < a.s. \infty

M \to \infty lim k = 0 \sum M c (x_{k}, g_{0}^{⋆} (π_{k})) < a.s. \infty

x_{k} \to a.s. X, as k \to \infty.

x_{k} \to a.s. X, as k \to \infty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Stochastic Model Predictive Control:

Output-Feedback, Duality and Guaranteed Performance

Martin A. Sehr [email protected]

Robert R. Bitmead [email protected] Department of Mechanical & Aerospace Engineering, University of California, San Diego, La Jolla, CA 92093-0411, USA

Abstract

A new formulation of Stochastic Model Predictive Output Feedback Control is presented and analyzed as a translation of Stochastic Optimal Output Feedback Control into a receding horizon setting. This requires lifting the design into a framework involving propagation of the conditional state density, the information state, via the Bayesian Filter and solution of the Stochastic Dynamic Programming Equation for an optimal feedback policy, both stages of which are computationally challenging in the general, nonlinear setup. The upside is that the clearance of three bottleneck aspects of Model Predictive Control is connate to the optimality: output feedback is incorporated naturally; dual regulation and probing of the control signal is inherent; closed-loop performance relative to infinite-horizon optimal control is guaranteed. While the methods are numerically formidable, our aim is to develop an approach to Stochastic Model Predictive Control with guarantees and, from there, to seek a less onerous approximation. To this end, we discuss in particular the class of Partially Observable Markov Decision Processes, to which our results extend seamlessly, and demonstrate applicability with an example in healthcare decision making, where duality and associated optimality in the control signal are required for satisfactory closed-loop behavior.

keywords:

stochastic control, predictive control, information state, performance analysis, dual optimal control.

\savesymbol

AND

††thanks: Corresponding author R. R. Bitmead. The material in this paper was not presented at any conference.

,

1 Introduction

MPC, in its original formulation, is a full-state feedback law. This underpins two theoretical limitations of MPC: accommodation of output feedback, and extension to include a cogent robustness theory since the state dimension is fixed. This paper addresses the first question. There have been a number of approaches, mostly hinging on replacement of the measured true state by a state estimate, which is computed via Kalman filtering [26, 33], moving-horizon estimator [5, 31], tube-based minimax estimators [20], etc. Apart from [5], these designs, often for linear systems, separate the estimator design from the control design. The control problem may be altered to accommodate the state estimation error by methods such as: constraint tightening [33], chance/probabilistic constraints [25], and so forth.

In this paper, we first consider Stochastic Model Predictive Control (SMPC), formulated as a variant of Stochastic Optimal Output Feedback Control (SOOFC), without regard to computational tractability restrictions. By taking this route, we establish a formulation of SMPC which possesses central features: accommodation of output feedback and duality/probing; examination of the probabilistic requirements of deterministic and probabilistic constraints; guaranteed performance of the SMPC controller applied to the system. Performance bounds are stated in relation to the infinite-horizon-optimally controlled closed-loop performance. We next particularize our performance results to the class of Partially Observable Markov Decision Processes (POMDPs), as is discussed explicitly in [28]. For this special class of systems, application of our results and verification of the underlying assumptions are computationally tractable, as we demonstrate using a numerical example in healthcare decision making from [29].

This paper does not seek to provide a comprehensive survey of the myriad alternative approaches proposed for Stochastic Model Predictive Control (SMPC). For that, we recommend the numerous available references such as [11, 16, 19, 21]. Rather, we present a new algorithm for SMPC based on SOOFC and prove, particularly, performance properties relative to optimality. As a by-product, we acquire a natural treatment of output feedback via the Bayesian Filter and of the associated controller duality required to balance probing for observability enhancement and regulation. The price we pay for general nonlinear systems is the suspension of disbelief in computational tractability. However, the approach delineates a target controller with assured properties. Approximating this intractable controller by a more computationally amenable variant, as opposed to identifying soluble but indirect problems without guarantees, holds the prospect of approximately attracting the benefits. Such a strategy, using a particle implementation of the Bayesian filter and scenario methods at the cost of losing duality of the control inputs, is discussed in [27]. Alternatively, as suggested in [29], one may approximate the nonlinear SMPC problem by POMDPs and apply the methods of the current paper directly, resulting in optimality and duality on the approximate POMDP system.

Comparison with Other Performance Results

Our work is related to four central papers discussing performance bounds linking the achieved cost of MPC on the infinite horizon with the cost of infinite-horizon optimal control:

Grüne & Rantzer

[13] study the deterministic, full-state feedback situation and provide comparison between the infinite-horizon stochastic optimal cost and the achieved infinite-horizon MPC cost. In particular, the achieved MPC cost is bounded in terms of the computed finite-horizon MPC cost.

Hernándes & Lasserre

[14] consider the stochastic case with full-state feedback and average as well as discounted costs. Their results yield a comparison between the infinite-horizon stochastic optimal cost and the achieved infinite-horizon MPC cost in terms of the unknown true optimal cost.

Chatterjee & Lygeros

[3] also treat the stochastic case with full-state feedback and average cost function. They establish and quantify a bound on the expected long-run average MPC performance related to the terminal cost function and its associated monotonicity requirement.

Riggs & Bitmead

[24] consider the stochastic full-state feedback as an extension to [13] via a discounted infinite-horizon cost function. Similarly to [13], they establish a performance bound of the achieved infinite-horizon MPC cost in terms of the computed finite-horizon MPC cost.

The current paper

extends [13, 24] to include output feedback stochastic MPC. Achieved performance is bounded in terms of the computed finite-horizon MPC cost. The incorporation of state estimation into the problem is the central contribution.

Each of these works relies on a sequence of assumptions concerning the well-posedness of the underlying optimization problems and specific monotonicity conditions on certain value functions which admit the establishment of stability and performance bounds.

We summarize the main contribution of this paper, Corollary 24, for stochastic MPC with state estimation. Subject to cost monotonicity Assumption 23, which is testable in terms of a known terminal policyand the terminal cost function, an upper bound is computable for the achieved infinite-horizon MPC cost in terms of the the computed finite-horizon MPC cost and other parameters of the monotonicity condition. As in [3], we provide an example – here a POMDP form healthcare – in which the assumptions are verified, indicating the substance of the assumptions and the nature of the conclusion regarding closed-loop output-feedback stochastic MPC.

Organization of this Paper

The structure of the paper is as follows. Section 2 briefly formulates SOOFC, as used in Section 3 to present a new SMPC algorithm. After discussing recursive feasibility of this algorithm in Section 4, we proceed by establishing conditions for boundedness of the infinite-horizon discounted cost of the SMPC-controlled nonlinear system in Section 5. Section 6 ties the performance of SMPC to the infinite-horizon SOOFC performance. Section 7 provides a brief encapsulation and post-analysis of the set of technical assumptions in the paper. The results are particularized for POMDPs in Section 8, followed by discussion of our numerical example in Section 9. We conclude the paper in Section 10. To aid the development, all proofs are relegated to the Appendix.

Notation

$\mathbb{R}$ and $\mathbb{R}_{+}$ are real and non-negative real numbers, respectively. The set of non-negative integers is denoted $\mathbb{N}_{0}$ and the set of positive integers by $\mathbb{N}_{1}$ . We write sequences as $\mathbf{t}^{m}\triangleq\{t_{0},t_{1},\ldots,t_{m}\}$ , where $m\in\mathbb{N}_{0}$ ; $\mathbf{t}^{\infty}$ is an infinite sequence of the same form. $\operatorname{pdf}(X)$ denotes the probability density function of random variable $X$ while $\operatorname{pdf}(X|Y)$ denotes the conditional probability density function of random variable $X$ given jointly distributed random variable $Y$ . The acronyms a.s., a.e. and i.i.d. stand for almost sure, almost everywhere and independent and identically distributed, respectively.

2 Stochastic Optimal Output-Feedback Control

We consider stochastic optimal control of nonlinear time-invariant dynamics of the form

[TABLE]

where $k\in\mathbb{N}_{0}$ , $x_{k}\in\mathbb{R}^{n_{x}}$ denotes the state with initial value $x_{0}$ , $u_{k}\in\mathbb{R}^{n_{u}}$ the control input, $y_{k}\in\mathbb{R}^{n_{y}}$ the measurement output, $w_{k}\in\mathbb{R}^{n_{w}}$ the process noise and $v_{k}\in\mathbb{R}^{n_{v}}$ the measurement noise. We denote by

[TABLE]

the known a-priori density of the initial state and by

[TABLE]

the data available at time $k$ . We make the following standing assumptions on the random variables and system dynamics.

Assumption 1.

The dynamics (1-2) satisfy

$f(\cdot,u,\cdot)$ * is differentiable a.e. with full rank Jacobian $\forall\,u\in\mathbb{R}^{n_{u}}$ .* 2. 2.

$h(\cdot,\cdot)$ * is differentiable a.e. with full rank Jacobian.* 3. 3.

$w_{k}$ * and $v_{k}$ are i.i.d. sequences with known densities.* 4. 4.

$x_{0},w_{k},v_{l}$ * are mutually independent for all $k,l\geq 0$ .*

Assumption 2.

The control input $u_{k}$ at time instant $k\geq 0$ is a function of the data $\mathbf{\zeta}^{k}$ and $\pi_{0\mid-1}$ .

As there is no direct feedthrough from $u_{k}$ to $y_{k}$ , Assumptions 1 and 2 assure that system (1-2) is a controlled Markov process [17]. Assumption 1 further ensures that $f$ and $h$ enjoy the Ponomarev 0-property [22] and hence that $x_{k}$ and $y_{k}$ possess joint and marginal densities.

2.1 Information State & Bayesian Filter

Definition 3.

The conditional density of state $x$ given data $\mathbf{\zeta}^{k}$ ,

[TABLE]

is the information state of system (1-2).

For a Markov system such as (1-2), the information state is propagated via the Bayesian Filter (e.g. [4, 30]):

[TABLE]

for $k\in\mathbb{N}_{0}$ and density $\pi_{0\mid-1}$ as in (3). For linear dynamics and Gaussian noise, the recursion (5-6) yields the Kalman Filter.

Definition 4.

The recursion (5-6) defines the mapping

[TABLE]

2.2 Cost and Constraints

Definition 5.

$\mathbb{E}_{k}[\,\cdot\,]$ * and $\mathbb{P}_{k}[\,\cdot\,]$ are expected value and probability with respect to state $x_{k}$ – with conditional density $\pi_{k}$ – and i.i.d. random variables $\{(w_{j},v_{j+1}):j\geq k\}$ .*

Given the available data $\mathbf{\zeta}^{0}$ , we aim to select non-anticipatory (i.e. subject to Assumption 2) control inputs $u_{k}$ to minimize

[TABLE]

where $N$ is the control horizon, $c:\mathbb{R}^{n_{x}}\times\mathbb{R}^{n_{u}}\to\mathbb{R}_{+}$ the stage cost, $c_{N}:\mathbb{R}^{n_{x}}\to\mathbb{R}_{+}$ the terminal cost and $\alpha\in\mathbb{R}_{+}$ a discount factor. Drawing from the literature (e.g. [1, 17]), optimal controls in (8) must inherently be separated feedback policies. That is, control input $u_{k}$ depends on data $\mathbf{\zeta}^{k}$ and initial density $\pi_{0\mid-1}$ solely through the current information state $\pi_{k}$ . Optimality thus requires propagating $\pi_{k}$ and policies $g_{k}$ , where

[TABLE]

Cost (8) then reads

[TABLE]

Extending stochastic optimal control problems with cost (10) to the infinite horizon (see [1, 2]) typically requires $\alpha<1$ and omitting the terminal cost term $c_{N}(\cdot)$ , leading to

[TABLE]

In addition to minimizing the expected value cost (10), we impose probabilistic state constraints of the form

[TABLE]

for $\epsilon_{k}\in[0,1)$ . That is, we enforce constraints with respect to the known distributions of the future noise variables and the conditional density of the current state $x_{k}$ , captured by the information state $\pi_{k}$ . Moreover, we consider input constraints of the form

[TABLE]

When discussing infinite-horizon optimal control with cost (11), we replace the state constraints (12) by the stationary probabilistic state constraints

[TABLE]

for $\epsilon_{\infty}\in[0,1)$ and the input constraints (13) by

[TABLE]

Definition 6.

Denote by $\mathcal{D}$ the set of all densities on $\mathbb{R}^{n_{x}}$ . Further define $\mathcal{C}_{k}\subseteq\mathcal{D},k\in\mathbb{N}_{1},$ to be the set of all $\pi_{k}$ of $x_{k}$ satisfying the probabilistic constraint (12). Define $\mathcal{C}_{\infty}$ likewise for (14).

2.3 Stochastic Optimal Control

Definition 7.

Given dynamics (1-2), $\alpha\in\mathbb{R}_{+}$ and horizon $N\in\mathbb{N}_{1}$ , define the finite-horizon stochastic optimal control problem

[TABLE]

Definition 8.

Given dynamics (1-2) and $\alpha\in\mathbb{R}_{+}$ , define the infinite-horizon stochastic optimal control problem

[TABLE]

Definition 9.

$\pi_{0}$ * is feasible for $\mathcal{P}_{N}(\cdot)$ if there exists a sequence of policies $\mathbf{g}^{N-1}$ such that, $\{w_{j},v_{j+1}\}_{j\geq 0}$ -a.s., $u_{j}=g_{j}(\pi_{j})$ satisfy the constraints and $J_{N}(\pi_{0},\mathbf{g}^{N-1})$ is finite. Define feasibility likewise for $\mathcal{P}_{\infty}(\pi_{0})$ .*

In Stochastic Optimal Control, feasibility entails the existence of policies $g_{k}(\cdot)$ such that for any $\pi_{k}\in\mathcal{C}_{k}$ , $g_{k}(\pi_{k})\in\mathcal{U}_{k}$ and

[TABLE]

Even though the state constraints (12) are probabilistic, this condition results in an equivalent almost sure constraint on the conditional state densities. The stochastic optimal feedback policies in $\mathcal{P}_{N}(\pi_{0})$ may now be computed in principle by solving the Stochastic Dynamic Programming Equation (SDPE),

[TABLE]

for $k=0,\ldots,N-1$ and $\pi_{k}\in\mathcal{C}_{k}$ . The equation is solved backwards in time, from its terminal value

[TABLE]

Solution of the SDPE is the primary source of the restrictive computational demands in Stochastic Optimal Control. The reason for this difficulty lies in the dependence of the future information state in each step of (15-16) on the current and future control inputs. While the dependence on future control inputs is limiting even in deterministic control, the computational burden is drastically worsened in the stochastic case because of the complexity of the operator $T_{k}$ in (7). On the other hand, optimality via the SDPE leads to a control law of dual nature. Dual optimal control connotes the compromise in optimal control between the control signal’s function to reveal the state and its function to regulate that state. These dual actions are typically antagonistic [9]. The duality of stochastic optimal control is a generic feature, although there exist some problems – called neutral – where the probing nature of the control evanesces, linear Gaussian control being one such case.

Notice that, while the Bayesian Filter (5-6) can be approximated to arbitrary accuracy using a Particle Filter [30], the SDPE cannot be easily simplified without loss of optimal probing in the control inputs. While control laws generated without solution of the SDPE can be modified artificially to include certain excitation properties, as discussed for instance in [10, 18], such approaches are suboptimal and do not generally enjoy the theoretical guarantees discussed below. For the stochastic optimal control problems considered here, excitation of the control signal is incorporated automatically and as necessary through the optimization. The optimal control policies, $g^{\star}_{j}(\cdot)$ , will inherently inject excitation into the control signal depending on the quality of state knowledge embodied in $\pi_{k}$ .

3 Stochastic Model Predictive Control

Notice how this algorithm differs from common practice in SMPC [15, 21] in that we explicitly use the information states $\pi_{k}\in\mathcal{D}$ . Throughout the literature, these information states – conditional densities – are replaced by best available, or certainty-equivalent state estimates in $\mathbb{R}^{n_{x}}$ . While this makes the problem more tractable, one no longer solves the underlying stochastic optimal control problem. As we shall demonstrate in this paper, using information state $\pi_{k}$ and optimal policy $g_{0}^{\star}(\cdot)$ resulting from solution of Problem $\mathcal{P}_{N}(\pi_{k})$ at each time instance leads to a number of results regarding closed-loop performance on the infinite horizon.

4 Recursive Feasibility

Assumption 10.

$\pi_{0\mid-1}$ * yields $\pi_{0}$ feasible for $\mathcal{P}_{N}(\cdot)$ , $v_{0}$ -a.s.*

Assumption 11.

The constraints in $\mathcal{P}_{N}(\cdot)$ and $\mathcal{P}_{\infty}(\cdot)$ , for $j=1,\ldots,N-1$ , satisfy

[TABLE]

Assumption 12.

For all densities $\pi_{k}\in\mathcal{C}_{N}$ , there exists a policy $\tilde{g}(\pi_{k})$ satisfying

[TABLE]

Theorem 13.

Given Assumptions 10-12, SMPC yields $\pi_{k}$ feasible for $\mathcal{P}_{N}(\cdot)$ , $\{w_{j},v_{j+1}\}_{j\geq 0}$ -a.s., for all $k\in\mathbb{N}_{1}$ .

The proof of this result follows directly as a stochastic version of the corresponding result in deterministic MPC, e.g. [12]. Notice that recursive feasibility and compact $\mathcal{X}_{1}$ immediately implies a stability result independent of the cost (10), i.e.

[TABLE]

for $k\in\mathbb{N}_{1}$ .

5 Convergence and Stability

Assumption 14.

For a given $\alpha\in\mathbb{R}_{+}$ , the terminal feedback policy $\tilde{g}(\pi)$ specified in Assumption 12 satisfies

[TABLE]

for all densities $\pi$ of $x$ with $\pi\in\mathcal{C}_{N}$ . The expectation $\mathbb{E}_{\pi}[\cdot]$ is with respect to state $x$ – with density $\pi$ – and $w$ .

For $\alpha\geq 1$ , Assumption 14 can be interpreted as the existence of a stochastic Lyapunov function on the terminal set of densities, $\mathcal{C}_{N}$ . If (18) holds for $\alpha\geq 1$ , it naturally holds for all $\alpha\in(0,1]$ .

Theorem 15.

Given Assumptions 10-14, SMPC yields

[TABLE]

While the discount factor $\alpha$ may not seem to play a major role in this result, notice that small values of $\alpha$ may be required to satisfy Assumption 14. For $\alpha\geq 1$ , (19) implies almost sure convergence to 0 of the achieved stage cost.

Assumption 16.

State $x$ is detectable via the stage cost:

[TABLE]

Theorem 17.

Given Assumptions 10-16, SMPC with $\alpha\geq 1$ yields

[TABLE]

and

[TABLE]

While (20) holds only for $\alpha\geq 1$ , notice that SMPC for $\alpha\in[0,1)$ with recursive feasibility possesses the default stability property (17). For zero terminal cost $c_{N}(x)\equiv 0$ , Assumption 18 replaces Assumption 14 to guarantee (19), a finite discounted infinite-horizon SMPC cost.

Assumption 18.

The terminal feedback policy $\tilde{g}(\pi)$ specified in Assumption 12 satisfies

[TABLE]

for all densities $\pi$ of $x$ with $\pi\in\mathcal{C}_{N}$ .

Corollary 19.

Given Assumptions 10-12 and 18, SMPC with zero terminal cost $c_{N}(x)\equiv 0$ yields

[TABLE]

Moreover, if $\alpha=1$ and Assumption 16 is added, we have

[TABLE]

6 Infinite-Horizon Performance Bounds

In the following, we establish performance bounds for SMPC, implemented on the infinite horizon as a proxy to solving the infinite-horizon stochastic optimal control problem $\mathcal{P}_{\infty}(\pi)$ . These bounds are in the spirit of previously established bounds reported for deterministic MPC in [13] and the stochastic full state-feedback case in [24].

Assumption 20.

There exist $\gamma\in[0,1]$ and $\eta\in\mathbb{R}_{+}$ such that

[TABLE]

for all densities $\pi_{0}$ of $x_{0}$ which are feasible in $\mathcal{P}_{N}(\cdot)$ .

Definition 21.

Denote by $\mathbf{g}^{MPC}$ the SMPC implementation of policy $g_{0}^{\star}(\cdot)$ on the infinite horizon, i.e.

[TABLE]

Similarly, $\mathbf{g}^{\star^{N-1}}$ and $\mathbf{g}^{\star^{\infty}}$ are the optimal sequences of policies in Problems $\mathcal{P}_{N}(\cdot)$ and $\mathcal{P}_{\infty}(\cdot)$ , respectively.

Theorem 22.

Given Assumptions 10-12 and 20, SMPC with $\alpha\in[0,1)$ yields

[TABLE]

In the special case $\gamma=0$ , we impose the following assumption on the terminal cost to obtain an insightful corollary to Theorem 22.

Assumption 23.

For $\alpha\in[0,1)$ , there exists $\eta\in\mathbb{R}_{+}$ such that the terminal policy $\tilde{g}(\cdot)$ specified in Assumption 12 satisfies

[TABLE]

for all densities $\pi$ of $x$ with $\pi\in\mathcal{C}_{N}$ . The expectation $\mathbb{E}_{\pi}[\cdot]$ is with respect to state $x$ – with density $\pi$ – and $w$ .

Corollary 24.

Given Assumptions 10-12 and 23, SMPC with $\alpha\in[0,1)$ yields

[TABLE]

This Corollary relates the following quantities: design cost, $J_{N}(\pi_{0},\mathbf{g}^{*^{N-1}}),$ which is known as part of the SMPC calculation, optimal cost $J_{\infty}(\pi_{0},\mathbf{g}^{*^{\infty}})$ which is unknown (otherwise we would use $\mathbf{g}^{*^{\infty}}$ ), and unknown infinite-horizon SMPC achieved cost $J_{\infty}(\pi_{0},\mathbf{g}^{MPC})$ .

7 Analysis of Assumptions

The sequence of assumptions becomes more inscrutable as our study progresses. However, they deviate only slightly from standard assumptions in MPC, suitably tweaked for stochastic applications. Assumptions 1 and 2 are regularity conditions permitting the development of the Bayesian filter via densities and restricting the controls to causal policies. Assumptions 10 and 11 limit the constraint sets and initial state density to admit treatment of recursive feasibility.

Assumptions 12, 14, 18 and 23 each concerns a putative terminal control policy, $\tilde{g}(\cdot)$ . Assumption 12 implies positive invariance of the terminal constraint set under $\tilde{g}$ . Using the martingale analysis of the proof of Theorem 17, Assumption 14 ensures that the extant $\tilde{g}$ achieves finite cost-to-go on the terminal set. The cost-detectability Assumption 16 is familiar in Optimal Control to make the implication that finite cost forces state convergence. Assumption 18 temporarily replaces Assumption 14 only to consider the zero terminal cost case. Assumptions 20 and 23 presume monotonicity of the finite-horizon cost with increasing horizon, firstly for the optimal policy $g_{0}^{\star}$ and then for the putative terminal policy, $\tilde{g}$ on the terminal set. These monotonicity assumptions mirror those of, for example, [13] for deterministic MPC and [24] for full-state stochastic MPC. They underpin the deterministic Lyapunov analysis and the stochastic Martingale analysis based on the cost-to-go. These assumptions are validated for a POMDP example in Section 9.

8 Dual Optimal Stochastic MPC for POMDPs

We now proceed by particularizing the performance results from Section 6 for the special class of POMDPs, as suggested for instance in [28, 29, 32]. This class of problems is characterized by probabilistic dynamics on a finite state space $X=\{1,\ldots,n\}$ , finite action space $U=\{1,\ldots,m\}$ , and finite observation space $Y=\{1,\ldots,o\}$ . POMDP dynamics are defined by the conditional state transition and observation probabilities

[TABLE]

where $t\in\mathbb{N}_{0}$ , $i,j\in X$ , $a\in U$ , $\theta\in Y$ . The state transition dynamics (23) correspond to a conventional Markov Decision Process (MDP, e.g. [23]). However, the control actions $u_{t}$ are to chosen based on the known initial state distribution $\pi_{0}=\operatorname{pdf}(x_{0})$ and the sequences of observations, $\{y_{1},\ldots,y_{t}\}$ , and controls $\{u_{0},\ldots,u_{t-1}\}$ , respectively. That is, we are choosing our control actions in a Hidden Markov Model (HMM, e.g. [8]) setup. Notice that, while POMDPs conventionally do not have an initial observation $y_{0}$ in (24), as is commonly assumed in nonlinear system models of the form (1-2), one can easily modify this basic setup without altering the following discussion.

Given control action $u_{t}=a$ and measured output $y_{t+1}=\theta$ , the information state $\pi_{t}$ in a POMDP is updated via

[TABLE]

where $\pi_{t,j}$ denotes the $j^{\text{th}}$ entry of the row vector $\pi_{t}$ . To specify the cost functionals (10) and (11) in the POMDP setup, we write the stage cost as $c(x_{t},u_{t})=c_{i}^{a}$ if $x_{t}=i\in X$ and $u_{t}=a\in U$ , summarized in the column vectors $c(a)$ of the same dimension as row vectors $\pi_{k}$ . Similarly, the terminal cost terms are $c_{N}(x_{t})=c_{i,N}$ if $x_{N}=i\in X$ , summarized in the column vector $c_{N}$ . The infinite horizon cost functional defined in Section 2 then follows as

[TABLE]

with corresponding finite-horizon variant

[TABLE]

Extending (15-16), optimal control decisions may then be computed via

[TABLE]

for $k=0,\ldots,N-1$ , from terminal value function

[TABLE]

Assumption 25.

For $\alpha\in[0,1)$ , there exist $\eta\in\mathbb{R}_{+}$ and a policy $\tilde{g}(\cdot)$ such that

[TABLE]

for all densities $\pi_{0}$ of $x_{0}\in X$ .

Theorem 26 ([28]).

Given Assumption 25, SMPC for POMDPs with $\alpha\in[0,1)$ yields

[TABLE]

for all densities $\pi$ of $x\in X$ .

A special case of Corollary 24, this result allows us to bound the achieved infinite-horizon cost of SMPC on POMDPs. In this special case, we can compute the dual optimal control policies and verify Assumption 25 numerically, as is demonstrated for a particular example below.

9 An Example in Healthcare Decision Making

9.1 Problem Setup

Consider a patient treated for a specific disease which can be managed but not cured. For simplicity, we assume that the patient does not die under treatment. While this transition would have to be added in practice, it results in a time-varying model, which we avoid in order to keep the following discussion compact.

The example, introduced in [29], is set up as follows. The disease encompasses three stages with severity increasing from Stage 1 through Stage 2 to Stage 3, transitions between which are governed by a controlled Markov chain, where $P(a)$ is the transition probability matrix with values $p^{a}_{ij}$ at row $i$ and column $j$ and $R(a)$ is the observation matrix with elements $r^{a}_{j\theta}$ . All transition and observation probability matrices below are defined similarly. Once our patient enters Stage 3, Stages 1 and 2 are inaccessible for all future times. However, Stage 3 can only be entered through Stage 2, a transition from which to Stage 1 is possible only under costly treatment. The same treatment inhibits transitions from Stage 2 to Stage 3. We have access to the patient state only through imprecise tests, which will result in one of three possible values, each of which is representative of one of the three disease stages. However, these tests are imperfect, with non-zero probability of returning an incorrect disease stage. All possible state transitions and observations are illustrated in Figure 1.

At each point in time, the current information state $\pi_{t}$ is available to make one of four possible decisions/actions:

Skip next appointment slot. 2. 2.

Schedule new appointment. 3. 3.

Order rapid diagnostic test. 4. 4.

Apply available treatment.

Skipping an appointment slot results in the patient progressing through the Markov chain describing the transition probabilities of the disease without medical intervention, without new information being available after the current decision epoch. Scheduling an appointment does not alter the patient transition probabilities but provides a low-quality assessment of the current disease stage, which is used to refine the next information state. The third option, ordering a rapid diagnostic test, allows for a high-quality assessment of the patient’s state, leading to a more reliable refinement of the next information state than otherwise possible when choosing the previous decision option. The results from this diagnostic test are considered available sufficiently fast so that the patient state remains unchanged under this decision. The remaining option entails medical intervention, allowing probabilistic transition from Stage 2 to Stage 1 while preventing transition from Stage 2 to Stage 3. Transition probabilities $P(a)$ , observation probabilities $R(a)$ , and stage cost vectors $c(a)$ for each decision are summarized in Table 1. Additionally, we impose the terminal cost

[TABLE]

In the solution for the optimal feedback control, the selection of a diagnostic test comes at a cost to the objective criterion and, evidently, serves to refine the information state of the system/patient. It does so without effect on the regulation of the patient other than to improve the information state. Clearly, testing to resolve the state of the patient is part of an optimal strategy in this stochastic setting; but it does take resources. A certainty-equivalent feedback control would assign treatment on the supposition that the patient’s state is precisely known. Such a controller would never order a test. The decision to apply a test in the following numerical solution is evidence of duality in receding-horizon stochastic optimal control, viz. SMPC.

9.2 Computational Results

The trade-off between the two principal decision categories – testing versus treatment, probing versus regulating, exploration versus excitation – is precisely what is encompassed by duality, which we can include in an optimal sense by solving (25-26) and applying the resulting initial policy in receding horizon fashion. This is demonstrated in Figure 2, which shows simulation results for SMPC with control horizon $N=4$ and discount factor $\alpha=0.85$ . As anticipated, the stochastic optimal receding horizon policy shows a structure not drastically different from the decision structure motivated above. In particular, diagnostic tests are used effectively to decide on medical intervention.

In order to apply Theorem 26 to this particular example, we choose the policy $\tilde{g}(\cdot)$ in Assumption 25 always to apply medical intervention. Using the worst-case scenario for the expectations in (27), which entails transition from Stage 1 to Stage 2 under treatment, we can satisfy Assumption 25 with $\eta\approx 7$ . The computed cost in our simulation is $J_{N}(\pi_{0},\mathbf{g}^{\star^{N-1}})\approx 8.5$ . Combined with the discount factor $\alpha=0.85$ , we thus have the upper bound

[TABLE]

via application of Theorem 26. Denoting by $e_{j}$ the row-vector with entry $1$ in element $j$ and zeros elsewhere, the observed (finite-horizon) cost corresponding with Figure 2 is

[TABLE]

10 Conclusions

The central contribution of the paper is the presentation of an SMPC algorithm based on SOOFC. This yields a number of theoretical properties of the controlled system, some of which are simply recognized as the stochastic variants of results from deterministic full-state feedback MPC with their attendant assumptions, including for instance Theorem 13 for recursive feasibility. Theorem 15 is the main stability result in establishing the finiteness of the discounted cost of the SMPC-controlled system. Theorem 17 and Corollary 19 deal with consequent convergence of the state in special cases.

Performance guarantees of SMPC are made in comparison to performance of the infinite-horizon stochastically optimally controlled system and are presented in Theorem 22 and Corollary 24. These results extend those of [24], which pertain to full-state feedback stochastic optimal control and which therefore do not accommodate duality. Other examples of stochastic performance bounds are mostly restricted to linear systems and, while computable, do not relate to the optimal constrained control. While the formal stochastic results are traceable to deterministic predecessors, the divergence from earlier work is also notable. This concentrates on the use of the information state to accommodate measurements and the exploration of control policy functionals stemming from the Stochastic Dynamic Programming Equation. The resulting output feedback control possesses duality and optimality properties which are either artificially imposed in or absent from earlier approaches.

We have further suggested two potential strategies to ameliorate the computational intractability of the Bayesian filter and SDPE, famous for its curse of dimensionality. Firstly, one may use the Particle filter implementation of the Bayesian filter, which has many examples of fast execution for small state dimensions, which with a loss of duality can be combined with scenario methods. This approach is discussed in [27] as an approximation of the algorithm in this paper. Secondly, we point out that our algorithm becomes computationally tractable for the special case of POMDPs, which may be used either to approximate a nonlinear model or to model a given system in the first place. This strategy inherits the dual nature of our SMPC algorithm for general nonlinear systems.

Appendix A Proofs

A.1 Theorem 15

Denote by $M_{k}$ the discounted $\mathcal{P}_{N}$ -cost-to-go,

[TABLE]

where $g_{j}^{\star}(\cdot)$ , $j=0,\ldots,N-1$ , are the optimal feedback policies in Problem $\mathcal{P}_{N}(\cdot)$ . Moreover, define $\mathcal{F}_{k}$ as the $\sigma$ -algebra generated by the initial state $x_{0}$ with density $\pi_{0\mid-1}$ and the i.i.d. noise sequences $w_{j}$ and $v_{j}$ for $j=0,\ldots,k+N-1$ . Then $M_{k}$ is $\mathcal{F}_{k}$ -measurable and $M_{k}\geq 0$ by non-negativity of stage and terminal cost. Then,

[TABLE]

and, by optimality of the policies $g^{\star}_{j}(\cdot)$ in $\mathcal{P}_{N}(\cdot)$ ,

[TABLE]

where $\tilde{g}(\cdot)$ denotes the terminal feedback policy, specified by Assumptions 12 and 14, and feasibility follows as in the proof of Theorem 13. Given that

[TABLE]

is $\mathcal{F}_{k}$ -measurable, we then have

[TABLE]

By Assumption 14, this yields

[TABLE]

Taking expectations in (28) further gives

[TABLE]

where $\mathbb{E}_{0}\left[M_{0}\right]<\infty$ via feasibility of $\pi_{0}$ for $\mathcal{P}(\cdot)$ . By positivity of the stage cost, this yields

[TABLE]

Inequalities (28) and (29) with non-negativity of the stage cost show that $M_{k}$ is a non-negative $L^{1}$ -supermartingale on its filtration $\mathcal{F}_{k}$ and thus, by Doob’s Martingale Convergence Theorem (see [7]), converges almost surely to a finite random variable,

[TABLE]

Now define $Z_{k}$ to be the discounted sample $\mathcal{P}_{N}$ cost-to-go plus the achieved MPC cost at time $k$ ,

[TABLE]

Then,

[TABLE]

That is, recognizing that $Z_{0}=M_{0}$ so that $\mathbb{E}_{0}[|Z_{0}|]<\infty$ , $Z_{k}$ also is a non-negative $L^{1}$ -supermartingale and converges almost surely to a finite random variable

[TABLE]

However, by definition of $Z_{k}$ and (30), this implies (19). ∎

A.2 Theorem 17

First proceed as in the proof of Theorem 15. By Doob’s Decomposition Theorem (see [6]) on (30), there exists a martingale $\mathcal{M}_{k}$ and a decreasing sequence $\mathcal{A}_{k}$ such that $M_{k}=\mathcal{M}_{k}+\mathcal{A}_{k}$ , where $A_{k}\to A_{\infty}$ a.s. by (30). Using this decomposition, (28) yields

[TABLE]

Taking limits as $k\to\infty$ and re-invoking non-negativity of the stage cost then leads to $c(x_{k},g_{0}^{\star}(\pi_{k}))\stackrel{{\scriptstyle}}{{\to}}0$ a.s., which by the detectability condition on the stage cost (Assumption 16) verifies (20). ∎

A.3 Theorem 22

The optimal value function in the SDPE (15) satisfies $V_{0}(\pi_{0})=J_{N}(\pi_{0},\mathbf{g}^{\star^{N-1}})$ , so that optimality of policy $g_{0}^{\star}(\cdot)$ in Problem $\mathcal{P}_{N}(\pi_{0})$ implies

[TABLE]

which by Assumption 20 yields

[TABLE]

Now denote by $J_{\infty}^{M}(\pi_{0},\mathbf{g}^{MPC})$ the first $M\in\mathbb{N}_{1}$ terms of the infinite-horizon cost $J_{\infty}(\pi_{0},\mathbf{g}^{MPC})$ subject to the SMPC implementation of policy $g_{0}^{\ast}(\cdot)$ . By (31), we have

[TABLE]

such that

[TABLE]

which by non-negativity of the stage cost confirms the right-hand inequality in (22) in the limit as $M\to\infty$ . The left-hand inequality follows directly from optimality. ∎

A.4 Corollary 24

For conditional densities $\pi_{1}$ of $x_{1}$ such that $\pi_{1}\in\mathcal{C}_{1}$ , use optimality and subsequently Assumption 23 to conclude

[TABLE]

which by (17) implies $V_{0}(\pi_{k})-V_{1}(\pi_{k})\leq\eta$ for $k\in\mathbb{N}_{1}$ . However, this means Assumption 20 is satisfied with $\gamma=0$ and thus completes the proof by Theorem 22. ∎

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. P. Bertsekas. Dynamic programming and optimal control . Athena Scientific, Belmont, MA, 1995.
2[2] D. P. Bertsekas and S. E. Shreve. Stochastic optimal control: The discrete time case , volume 23. Academic Press, New York, NY, 1978.
3[3] D. Chatterjee and J. Lygeros. On stability and performance of stochastic predictive control techniques. IEEE Transactions on Automatic Control , 60(2):509–514, 2015.
4[4] Z. Chen. Bayesian filtering: From Kalman filters to particle filters, and beyond. Statistics , 182(1):1–69, 2003.
5[5] D. A. Copp and J. P. Hespanha. Nonlinear output-feedback model predictive control with moving horizon estimation. In 53rd IEEE Conference on Decision and Control , pages 3511–3517, Los Angeles, CA, 2014.
6[6] J. L. Doob. Stochastic processes . John Wiley & Sons, New York, NY, 1953.
7[7] J. L. Doob. Classical Potential Theory and Its Probabilistic Counterpart . Springer, Berlin, 1984.
8[8] R. J. Elliott, L. Aggoun, and J. B. Moore. Hidden Markov models: estimation and control , volume 29. Springer Science & Business Media, 2008.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Stochastic Model Predictive Control:

Abstract

keywords:

1 Introduction

Comparison with Other Performance Results

Organization of this Paper

Notation

2 Stochastic Optimal Output-Feedback Control

Assumption 1**.**

Assumption 2**.**

2.1 Information State & Bayesian Filter

Definition 3**.**

Definition 4**.**

2.2 Cost and Constraints

Definition 5**.**

Definition 6**.**

2.3 Stochastic Optimal Control

Definition 7**.**

Definition 8**.**

Definition 9**.**

3 Stochastic Model Predictive Control

4 Recursive Feasibility

Assumption 10**.**

Assumption 11**.**

Assumption 12**.**

Theorem 13**.**

5 Convergence and Stability

Assumption 14**.**

Theorem 15**.**

Assumption 16**.**

Theorem 17**.**

Assumption 18**.**

Corollary 19**.**

6 Infinite-Horizon Performance Bounds

Assumption 20**.**

Definition 21**.**

Theorem 22**.**

Assumption 23**.**

Corollary 24**.**

7 Analysis of Assumptions

8 Dual Optimal Stochastic MPC for POMDPs

Assumption 25**.**

Theorem 26** ([28]).**

9 An Example in Healthcare Decision Making

9.1 Problem Setup

9.2 Computational Results

10 Conclusions

Appendix A Proofs

A.1 Theorem 15

A.2 Theorem 17

A.3 Theorem 22

A.4 Corollary 24

Assumption 1.

Assumption 2.

Definition 3.

Definition 4.

Definition 5.

Definition 6.

Definition 7.

Definition 8.

Definition 9.

Assumption 10.

Assumption 11.

Assumption 12.

Theorem 13.

Assumption 14.

Theorem 15.

Assumption 16.

Theorem 17.

Assumption 18.

Corollary 19.

Assumption 20.

Definition 21.

Theorem 22.

Assumption 23.

Corollary 24.

Assumption 25.

Theorem 26 ([28]).